Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411511 Posts in 69375 Topics- by 58430 Members - Latest Member: Jesse Webb

April 26, 2024, 12:57:45 PM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)The most remarkable bugs you've encountered
Pages: [1] 2
Print
Author Topic: The most remarkable bugs you've encountered  (Read 7881 times)
ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« on: January 15, 2019, 04:02:18 PM »

I was thinking of posting this in the happy or grumpy programmer room, but it doesn't really fit in either of those. Let's make this a thread for sharing stories of bugs that have left you scratching your head, or had some really incredible explanation once you figured out the root cause.



The unchanging pointer

Yesterday, I was implementing something that involved displaying a multi-line text field whose contents would change periodically. It displayed a few messages just fine, but would randomly cause a crash after being updated enough times. The crash report pointed me to my text rendering function, where strlen had returned an implausibly large value, and my text renderer was overrunning into protected memory space. Seemed like a simple null termination issue. However, I was able to verify that the string I was trying to render was definitely terminated correctly in the case where it was crashing.

Digging deeper with the debugger, I found that the line break indexes calculated by my word wrapper were referencing locations beyond the end of my string. I thought at first that I had an off-by-one error since the string I was rendering ended with a newline, but no - the index being referenced was 10 or 12 characters past where the string should have ended. It seemed like my word wrapper was using stale data that hadn't been recalculated since the string was last updated. Since the same wrapped string might be drawn many times, I had it set up to cache the results of wrapping so that it would only need to perform it once per unique string. This was done by checking whether any of the values related to wrapping had changed since it was last calculated - font, string, wrapping behavior, or wrap width. The string had definitely changed, so why wasn't it picking up on it?

The function I used to change the wrapped string did this (some irrelevant details omitted):

Code:
void setString(const char * newString) {
    free(myString);
    myString = strdup(newString);
}

(Sidenote: For those not familiar with the C standard library, strdup allocates and returns a new copy of a string.)

The function that checks whether wrapping needs to be recalculated does a pointer compare between myString and the last value that had been used for word wrapping. At the time, this seemed like a safe way to do it - strdup would allocate a whole new pointer when the string was changed, so the value would definitely be different if it changed, right?

Nope. Since I was freeing the old cached string before allocating the new copy, strdup would sometimes return the exact same pointer I had just freed, even if the contents ended up being different.

The setString function now sets a separate boolean value, which the word wrapper checks to see if it needs to rewrap. Crash is gone and everything is fixed. This goes to show how small incorrect assumptions can have subtle cascading effects, and lead to catastrophic results in some cases.
Logged

ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #1 on: January 16, 2019, 05:18:42 PM »

 Hand Clap

I have a story where I found a legitimate compiler bug in vc++ involving method pointers and standard tuples, but I don't have enough time tonight to do a full write-up so I'll post about it tomorrow probably.
Logged

otresnjak
Level 0
**


View Profile
« Reply #2 on: February 04, 2019, 08:07:35 PM »

Oh, man, I've had a couple of memorable ones over the years...

The Overly Helpful Installer

Many years ago at my first game job, I was asked to build Windows installers for the DLC for our game, using a then-popular installer framework that I'll leave unnamed.

Every time I delivered a set of installers, the DLC team's producer would complain that I must have included the wrong content, because the installers would just install the base game. I explained to him, incredulously, this couldn't possibly be the case, given that the installers were the correct size for the DLC and far too small to contain the entire game, and in any case, I had of course tested them, and they worked for me.

It wasn't until he offered to reproduce the problem for me on his own PC that we discovered the problem--

He had all of the installers in the same folder. The naming convention we were using was something like this:
Setup.exe
Setup - Equine Protection DLC.exe
Setup - Mage Citadel DLC.exe
...and so on.

Removing the setup for the base game caused the problem to go away, and through trial an error, we discovered that, for some reason, the installation EXE emitted by the installer tool would, if the current folder contained an EXE whose filename was a prefix of its own, silently run it and close. So no matter which of the above EXEs tried to start, Setup.exe would be the one that actually ran.

So the solution was changing our installer naming convention a little bit, but to this day I wonder what possible reasoning could have led to this bizarre, undocumented feature.

This Disappearing Null Check

I wrote a blog post about this one a while back: http://orintresnjak.com/undefined-behavior/

Suffice to say that thoughtlessly converting a pointer to a reference in C++ can lead to some rather... confusing bugs.
Logged

kason.xiv
Level 0
***


View Profile
« Reply #3 on: February 05, 2019, 04:31:25 PM »

I was once doing some firmware stuff at work... and me and another spent hours trying to debug some stack corruption that we finally tracked down to this:

Code:
*iteration++;

In multiple locations we were trying to dereference a pointer and increment it. We looked up the precedence of C operators.. and surely enough ++ holds greater precedence than the dereference operator. This means that we were incrementing the memory location of the pointer and then dereferencing (and doing nothing)!

Issues like these are unfortunate because their not undefined or unspecified behaviour, so they wont trip static analysis tools or throw compiler warnings (I don't think?).

Can't imagine I'll ever make that same mistake again  Cheesy Roll Eyes
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #4 on: February 05, 2019, 05:33:39 PM »

I had a pretty odd bug once. I was trying to create a "variant" class type in C++, that could be used as a reference to any other type. It worked by storing a void* and another pointer to a type information class.

Problem code:

Code:
int a = 5;
variant var = a;
call_generic_func(state, var);

Turns out the var was stored in some state somewhere, leaving a dangling pointer to some random stack memory where the variable `a` was. Oddly enough, 95% of the time this stack space was untouched, and the code worked as expected... Until something randomly came along and clobbered the stack space!

Took a long time to figure out, since the details were hidden deep inside a complicated templated callstack.
Logged
ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #5 on: February 05, 2019, 06:06:59 PM »

I don't really understand why unary & and * don't have max precedence, but hey, I didn't invent C++ ¯\_(ツ)_/¯
Logged

kason.xiv
Level 0
***


View Profile
« Reply #6 on: February 05, 2019, 06:20:00 PM »

I don't really understand why unary & and * don't have max precedence, but hey, I didn't invent C++ ¯\_(ツ)_/¯

Yeah honestly. I've always been baffled that that's not the case.
Logged
Crimsontide
Level 5
*****


View Profile
« Reply #7 on: February 05, 2019, 09:57:11 PM »

I was once doing some firmware stuff at work... and me and another spent hours trying to debug some stack corruption that we finally tracked down to this:

Code:
*iteration++;

In multiple locations we were trying to dereference a pointer and increment it. We looked up the precedence of C operators.. and surely enough ++ holds greater precedence than the dereference operator. This means that we were incrementing the memory location of the pointer and then dereferencing (and doing nothing)!

Issues like these are unfortunate because their not undefined or unspecified behaviour, so they wont trip static analysis tools or throw compiler warnings (I don't think?).

Can't imagine I'll ever make that same mistake again  Cheesy Roll Eyes

*ptr++ is a pretty standard idiom in C.  Post increment returns the value before incrementing the pointer, so this returns a reference to the current item and then increments ptr.

I get that its a tad strange but its so common I find it hard to believe that it would lead to a bug let alone be remarkable??
Logged
oahda
Level 10
*****



View Profile
« Reply #8 on: February 06, 2019, 02:08:43 AM »

Because of things like this I try to always add parentheses even when they aren't necessary, just to make my intentions clear, so I would always write (*iteration)++ or *(iteration++) depending on what I mean and never *iteration++. Tongue
Logged

kason.xiv
Level 0
***


View Profile
« Reply #9 on: February 06, 2019, 06:21:12 AM »

*ptr++ is a pretty standard idiom in C.  Post increment returns the value before incrementing the pointer, so this returns a reference to the current item and then increments ptr.

Ah good point. I guess *++ptr is the more disgusting of the two.

I get that its a tad strange but its so common I find it hard to believe that it would lead to a bug let alone be remarkable??

Uh. Yeah it's certainly not flabbergasting. Was just the silliest bug I thought of after reading the thread title.
Logged
ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #10 on: February 06, 2019, 06:42:35 AM »

I guess ++*ptr is unambiguous in what it does too, so you could still use that form if you wanted to dereference and then increment.
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #11 on: February 06, 2019, 07:01:07 AM »

Thing is a ++pointer bug can easily hide. Sure it’s fairly common as far as string processing goes, but it’s not common in modern code at all. It used to be preferred back when exactly how one forms a loop actually had a real performance impact, from many decades ago. That kind of bug can be deviously innocuous.

@Crimsontide be nice! Smiley
Logged
Crimsontide
Level 5
*****


View Profile
« Reply #12 on: February 06, 2019, 09:41:01 AM »

@Crimsontide be nice! Smiley

Sorry, wasn't trying to be mean, tone can be hard to convey online.
Logged
Daid
Level 3
***



View Profile
« Reply #13 on: February 06, 2019, 10:37:56 AM »

My most remarkable bug was AI enemies suddenly stopping without a clear reason. Which I eventually traced down to a missing "const" in a c++ compare operator on my custom smart pointers. Causing the code to think it still had a target, as it did a cast of both smart pointers to boolean and compared those (my smart pointers used to have implicit cast to bool)
As one of the smart pointers was the result of a function, and thus a temporary, which you can only get const references off.

The remarkable about it was how long the bug was there, how very little issues it actually caused, and how infrequent it would trigger.
Logged

Software engineer by trade. Game development by hobby.
The Tribute Of Legends Devlog Co-op zelda.
EmptyEpsilon Free Co-op multiplayer spaceship simulator
isaaktual
Level 0
*



View Profile WWW
« Reply #14 on: March 01, 2019, 06:52:06 AM »

Bugs where you go 'how did this ever work!' are always fun.

The craziest bug recent bug, I actually tracked down to what I'm fairly sure is a hardware fault in my laptop's SSE unit.

Over new year's I had my game engine loading in .pngs.  Every so often it would fail with a checksum error.  Cue the search for the dangling pointer or the memory corruption, or the failed read, or something.

No, everything was fine.  I ended up printing out the internal state of libpng and zlib after every line was decompressed.  I found that, once out of every hundred thousand lines or so, a few bits would randomly flip in the adler32 checksum.  Calculating the checksum in myself gave the right answer every time.  Copy and pasting the routine from zlib into my code, also no issue.

Isolated it down to a test program that fed the exact same block of data to my system zlib a couple hundred thousand times.  On my laptop, it would eventually fail.  On other machines, totally fine.  Worked out that the system zlib must have been compiled with some really crazy optimisations, as the adler32 routine was super-vectorised using SSE.  zlib compiled myself results in pretty standard scalar code.

The best I managed was to isolate it down to a block of about 24 instructions.  Exact same data through that block repeatedly,  eventually I'd get a wrong result.  I guess my poor laptop is just broken.

Solution has been to just compile my own copy of zlib without SSE and keep working.  Not sure how the manufacturer would respond to "Hey just run this test program and you see that this instruction here is broken!"

> : D
Logged
qMopey
Level 6
*


View Profile WWW
« Reply #15 on: March 01, 2019, 12:36:30 PM »

Was the loader 32bit? Because on 64bit build SSE would be used anyways! Which could be super spooky.
Logged
isaaktual
Level 0
*



View Profile WWW
« Reply #16 on: March 01, 2019, 03:01:02 PM »

It was a 64-bit build.  :-)  I ended up having to look at the assembly instructions.  I think it must be a particular instruction sequence, using fairly uncommon instructions, because other than that specific case in the zlib checksum the laptop runs SSE code without any problems.  Or at least, if bits are getting flipped it must just be changing the colour of a pixel or something else harmless.
Logged
Thaumaturge
Level 10
*****



View Profile WWW
« Reply #17 on: March 02, 2019, 02:23:51 PM »

A recent bug that was a bit eerie at the time:

I was working on my side-game, Night River. Specifically, I was working on a part that involved pleading voices--all my own, I'll add.

And I had a problem: when played, they overlapped too much--and worse, when I went to the main menu they carried on for a while, even though they demonstrably stopped being selected for playing!

This was both confusing and a little unsettling--hearing my own voice plead when I was expecting quiet. o_o

In the end, I found the reason, however: in a few cases, I had mistakenly exported multiple sounds into a single file. The result was that those files held chains of sounds, that carried on longer than I expected!
Logged

Daid
Level 3
***



View Profile
« Reply #18 on: March 04, 2019, 04:24:30 AM »

It was a 64-bit build.  :-)  I ended up having to look at the assembly instructions.  I think it must be a particular instruction sequence, using fairly uncommon instructions, because other than that specific case in the zlib checksum the laptop runs SSE code without any problems.  Or at least, if bits are getting flipped it must just be changing the colour of a pixel or something else harmless.

Possibly, because zlib checksum is calling a lot of the same SSE instructions in a tight loop, your CPU's logic gates that are part of SSE are heating up and failing only in that specific case.
Logged

Software engineer by trade. Game development by hobby.
The Tribute Of Legends Devlog Co-op zelda.
EmptyEpsilon Free Co-op multiplayer spaceship simulator
nova++
Level 4
****


Real life space alien (not fake)


View Profile
« Reply #19 on: March 08, 2019, 03:32:09 PM »

This is a bit of an old video by now, but the effort to work out problems in my terrain system was rather... interesting.





When the noise samples start taking place beyond certain limits (this was testing a 90,000 km radius ringworld, so far beyond what the game will need to display), things get really, REALLY weird. I would have expected just a loss in precision, but instead you get complicated, shifting, geometrical patterns and I honestly have no idea why
Logged

Pages: [1] 2
Print
Jump to:  

Theme orange-lt created by panic