Fun with -fsanitize=undefined and Picolibc

zero_k · 2025-04-16T12:21:42 1744806102

Wow, this: "random() was returning values in int range rather than long." is a very nice bug find. Randomness is VERY hard to check for humans. For example, Python's binomial distribution is very bad on some inputs [1], giving widely wrong values, but nobody found it. I bumped into it when I implemented an algorithm to compute the approximate volume of solutions to a DNF, and the results were clearly wrong [2]. The algorithm is explained here by Knuth, in case you are interested [3]

[1] https://www.cs.toronto.edu/~meel/Slides/meel-distform.pdf [2] https://github.com/meelgroup/pepin [3] https://cs.stanford.edu/~knuth/papers/cvm-note.pdf

nasretdinov · 2025-04-16T09:48:58 1744796938

> String to float conversion had a table missing four values. This caused an array access overflow which resulted in imprecise values in some cases.

I've once wrote a function to parse the date format from log files that Go doesn't natively support, and forgot to add November. I quit that job in April, so I never saw any issues. However when 1st of November came my ex-colleagues saw no logs for this day, and when they found out the reason they created a hash tag #nolognovember which you can probably find somewhere to this day :)

bad_username · 2025-04-16T11:36:49 1744803409

> when 1st of November came my ex-colleagues saw no logs for this day

Faced with this symptom I would bet there was a "No" in a yaml somewhere :-)

lionkor · 2025-04-16T10:40:17 1744800017

Who needs unit tests when you have "squints lgtm"

wavemode · 2025-04-16T12:34:31 1744806871

This is kind of bug unit tests don't catch. The only way to exhaustively test that you're handling every possible input, is to loop over every possible input (which may tend to infinity).

Therein lies the importance of runtime assertions (so we can sanity check that parsing actually succeeded rather than silently failing) and monitoring (so we can sanity check that, for example, we don't ever go 24 hours without receiving data from the parsing job).

kccqzy · 2025-04-16T14:29:53 1744813793

> to loop over every possible input (which may tend to infinity).

This attitude is defeatist. The success of property based testing (see QuichCheck in Haskell or Hypothesis in Python) especially when combined with fuzzing shows that instead of looping over every possible input, looping over thousands of inputs tend to be good enough in practice to catch bugs.

Throwing infinity as a cop out is a lazy opinion held by people who don't understand infinity, or rather, the concept of infinity that's countable. Everything we model on a computer is countably infinite. When we have multiple of such countable infinite sets, the standard dovetail constructions guarantees that their union will be countable. Their Cartesian product will also be countable. You can always obtain an interesting prefix of such an infinite set for testing purposes.

wavemode · 2025-04-16T14:43:49 1744814629

Your tone implies to me that you are under the impression that I'm suggesting one should not test their software. Nothing could be further from the truth.

What I'm saying is that it's foolish not to take any measures at runtime to validate that the system is behaving correctly.

Who's to say that the logs themselves are even formatted correctly? Your software could be perfectly bug-free and you'd still have problems without knowing it, due to bugs in some other person's software. That's the point you're missing - no matter how many edge cases you account for, there's always another edge case.

imtringued · 2025-04-16T14:58:40 1744815520

It's called fuzzing.

wavemode · 2025-04-16T15:20:09 1744816809

Fuzzing does not catch all bugs. And even if it, did your software can still misbehave when all logical bugs are eliminated. Say for example the program parses correctly but uses up a large amount of RAM on certain inputs, causing occasional crashes and restarts in production. Say for example your program behaves perfectly but there's actually an occasional bug in the date formatting in the logs themselves.

So yeah, you need monitoring and assertions. A decent coverage of unit tests is good, but I wouldn't bother trying to invest in some sort of advanced fuzzing or quickcheck system. In my experience the juice isn't worth the squeeze.

Joker_vD · 2025-04-16T11:47:43 1744804063

Pfft, you know how tests for functions like this:

    func MonthToString(month int) string {
        switch month {
        case 1: return "January"
        case 2: return "February"
        ...
        case 10: return "October"
        case 12: return "December"
        default: panic(fmt.Errorf("invalid month number: %d", month))
        }
    }

are usually written? You take the switch's body, shove it into the test function, and then replace "case/return" with regexp to "assert.Equal" or something:

    func TestMonthToString(t *testing.T) {
        assert.Equal(t, "January", MonthToString(1))
        assert.Equal(t, "February", MonthToString(2))
        ...
        assert.Equal(t, "October", MonthToString(10))
        assert.Equal(t, "December", MonthToString(12))
        assert.PanicsWithError(t, "invalid month number: 13", func() { MonthToString(13) })
    }

Look ma, we got that sweet 100% code coverage!

gus_massa · 2025-04-16T15:28:51 1744817331

The solution is to add integration testing,

  for (i=1, i<13, i++) {
    assert.Equal(t, i, StringToMonth(MonthToString(i)))
  }

the reverse composition is harder to test.

debugnik · 2025-04-16T15:31:36 1744817496

Did you mean property testing?

BobbyTables2 · 2025-04-16T13:12:01 1744809121

I feel attacked! (:>

nasretdinov · 2025-04-16T15:30:52 1744817452

Nice nick name. What happened to the first one?

nasretdinov · 2025-04-16T12:34:39 1744806879

Well I guess you can have a unit test that checks that there are 12 entries :). Repeating the same list probably wouldn't have found an issue unfortunately

moefh · 2025-04-16T13:08:25 1744808905

> Passing pointers to the middle of a data structure. For example, free takes a pointer to the start of an allocation. The management structure appears just before that in memory; computing the address of which appears to be undefined behavior to the compiler.

To clarify, the undefined behavior here is that the sanitizer sees `free` trying to access memory outside the bounds of what was returned by `malloc`.

It's perfectly valid to compute the address of a struct just before memory pointed to by a pointer you have, as long as the result points to valid memory:

    void not_free(void *p) {
      struct header *h = (struct header *) (((char *)p) - sizeof(struct header));
      // ...
    }

In the case of `free`, that resulting pointer is technically "invalid" because it's outside what was returned by `malloc`, even though the implementation of `malloc` presumably returned a pointer to memory just past the header.

josephg · 2025-04-16T14:20:11 1744813211

Yeah; I used to enjoy poking through C code in well written programs. It’s amazing what gems you can find by people who really know their stuff.

I saw a string library once which took advantage of this. The library passed around classic C style char* pointers. They work in printf, and basically all C code that expects a string. But the strings had extra metadata stored before the string content. That metadata contained the string’s current length and the total allocation size. As a result, you could efficiently get a string length without scanning, append to a string, and do all sorts of other useful things that are tricky to do with bare allocations. All while maintaining support for the rest of the C ecosystem. It’s a very cool trick!

alexvitkov · 2025-04-16T14:50:35 1744815035

The Windows API uses this scheme for one of its 50 string types [1].

I"m not very fond of this design as it's easy to pass a "normal" C string, which compiles because BSTR is just a typedef to it.

You can allocate the exact same data structure, but store a pointer to the size prefix, instead of the first byte - you avoid that issue, and can still pass the data field to anything expecting a zero-terminated string:

  struct WeirdString { int size; char data[0]; };
  struct WeirdString* ws = ...;
  fopen(ws->data);

[1] BSTR - https://learn.microsoft.com/en-us/previous-versions/windows/...

bestouff · 2025-04-16T12:48:00 1744807680

> the vast bulk of sanitizer complaints came from invoking undefined or implementation-defined behavior in harmless ways

This is patently false. Any Undefined Behavior is harmful because it allows the optimizer to insert totally random code, and this is not a purely theoretical behavior, it's been repeatedly demonstrated happening. So even if your UB code isn't called, the simple fact it exists may make some seemingly-unrelated code behave wrongly.

dzaima · 2025-04-16T14:49:56 1744814996

It may theoretically be false, and probably false in some cases, but (at least temporarily) there are cases (not all! but some) where some of the current C compilers currently will never result in bad behavior even if UB actually happens, for now, and those do include some of those mentioned in the article. (not all though; e.g. the actually-used cases of signed overflow could behave badly; of course if one looks at the assembly and makes sure it's what you want, it will be,..as long as the code is always compiled by the specific version of the checked compiler)

For example, in clang/llvm, currently, doing arithmetic UB (signed overflow, out-of-range shifts, offsetting a pointer outside its allocation bounds, offsetting a null pointer, converting an out-of-range float to int, etc) will never result in anything bad, as long as you don't use it (where "using it" includes branching on or using as a load/store address or returning from a function a value derived from it, but doesn't include keeping it in a variable, doing further arithmetic, or even loading/storing it). Of course that's subject to change and not actually guaranteed by any documentation. Not a thing to rely on, but currently you won't ever need to release an emergency fix and get a CVE number for having "void *mem = malloc(10); void *tmp[1]; tmp[0] = mem-((int)two_billion + (int)two_billion); if (two_billion == 0) foo(tmp); free(mem);" in your codebase (..at least if compiling with clang; don't know about other compilers). (yes, that's an immense amount of caveats for an "uhh technically")

im3w1l · 2025-04-16T16:23:55 1744820635

> So even if your UB code isn't called, the simple fact it exists may make some seemingly-unrelated code behave wrongly.

This is fortunately not true. If it were, it would make runtime checks pointless. Consider this code

  free(ptr)
  already_freed = true;
  if (!alread_freed) {
    free(ptr)
  }

The second free would be undefined behavior, but since it doesn't run the snippet is fine.

ajross · 2025-04-16T14:34:09 1744814049

> This is patently false. Any Undefined Behavior is harmful because it allows the optimizer to insert totally random code

Undefined to who, though? Specific platforms and toolchains have always attached defined behavior to stuff the standard lists as undefined, and provided ways (e.g. toolchain-specific volatile semantics, memory barriers, intrinsic functions) to exploit that. Even things like inline assembly live in this space of dancing around what the standard allows. And real systems have been written to those tools, successfully. At the bottom of the stack, you basically always have to deal with stuff like this.

Your point is a pedantic proscription, basically. It's (heh) "patently false" to say that "Any Undefined Behavior is harmful".

Krssst · 2025-04-16T15:13:40 1744816420

Ubsan docs do mention one case where UB is defined by the implementation: floating point division by zero: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

By contrast, I'd assume any other report by ubsan to be fair game for the optimizer to do its thing and generate whatever code is going to be different from what was likely the developer's intention. If not in the current version, maybe in a future one.

ajross · 2025-04-16T15:31:52 1744817512

Unless the optimizer has rules around its own intrinsics or whatever, though. Again, this isn't a black/white issue. You need to know what specific toolchains are going to do when writing base level software like C libraries (or RTOS kernels, which is my wheelhouse). The semantics defined by the standard are not sufficient.

Krssst · 2025-04-16T15:52:50 1744818770

Memory barrier builtins, inline assembly and other builtins may not be defined by the standard itself but won't lead in themselves to undefined behavior in my understanding since those are compiler extensions and defined by the implementation. (though invalid use can lead to UB, such as passing 0 to builtin_clz or modifying registers outside of the clobber list in inline assembly)

With that being said, I would definitely expect that the small set of UB that ubsan reports about is actually undefined for the compiler that implements the sanitizer (meaning: either problematic now or problematic in some future update).

uecker · 2025-04-16T17:37:40 1744825060

I agree. The typical ubsan sanitizers (not all sanitizers though) detect serious issues which should be fixed in all cases and I would definitely consider it best practice to run testing with sanitizers (and I would also would recommend to run many in production).

LegionMammal978 · 2025-04-16T14:39:18 1744814358

Yeah, this is especially true if you're writing a libc. E.g., every libc allocator in existence invokes UB with respect to ISO C when it reads metadata placed before a malloc()ed block of memory. Doubly so, since ISO C arguably doesn't even allow a container_of() mechanism at all. At some point, you have to look at what the implementation is actually expecting of your code.

almostgotcaught · 2025-04-16T13:08:14 1744808894

> optimizer to insert totally random code

What are you even saying - what is your definition of "random code". FYI UB is exactly (one of) the places where an optimizer can insert optimized code.

arnsholt · 2025-04-16T13:31:37 1744810297

To take an example from the post: in some cases a value was computed that could overflow, but it was not used because of a later overflow check. I think the optimizer would be fully within its rights to delete the code inside the overflow check, because the computation implicitly asserts that it won't overflow (since overflow is undefined). I think this is a more or less useful way of thinking around UB: any operation you put in your program implicitly asserts that the values are such that UB won't happen. For example, dereferencing a pointer implicitly means it cannot be NULL, because derefing NULL is UB, and anything downstream of that deref which checks for NULL can be deleted.

flohofwoe · 2025-04-16T13:31:05 1744810265

Unfortunately UB is an umbrella term for all sorts of things, and some of those can be very harmful/unexpected, while others are (currently) harmless - but that may change in new compiler versions.

The typical optimization showcase (better code generation for signed integer loop counts) only works when the (undefined behaviour) signed integer overflow doesn't actually happen (e.g. the compiler is free to assume that the loop count won't overflow). But when the signed integer overflow happens all bets are off what will actually happen to the control flow - while that same signed integer overflow in another place may simply wrap around.

Another similar example is to specifically 'inject' UB by putting a `std::unreachable` into the default case of a switch statement. This enables an optimization that the compiler omits a range check before accessing the switch-case jump table. But if the switch-variable isn't handled in a case-branch, the jump table access may be out-of-bounds and there will be a jump to a random location.

In other situations the compiler might even be able to detect at compile time that the UB is triggered and simply generate broken code (usually optimizing away some critical part), or if you're lucky the compiler inserts an ud instruction which crashes the process.

moefh · 2025-04-16T13:23:00 1744809780

Not OP, but here's an example of "random code" inserted by the compiler[1]: note the assembly instruction "ud2" ("invalid opcode exception" in x86 land) instead of "ret" in not_ok().

You might think this code would be fine if address 0 were mapped to RAM, but both gcc and clang know it's undefined behavior to use the null pointer like that, so they add "random code" that forces a processor exception.

[1] https://godbolt.org/z/sK55YsGz1

anamexis · 2025-04-16T14:42:37 1744814557

That doesn't sound very random to me!

juliangmp · 2025-04-16T11:38:53 1744803533

> [...] detect places where the program wanders into parts of the C language specification [...]

Small nitpick, the UB sanitizer also has some checks specific for C++ https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html