Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411490 Posts in 69371 Topics- by 58428 Members - Latest Member: shelton786

April 25, 2024, 02:04:41 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)The most remarkable bugs you've encountered
Pages: 1 [2]
Print
Author Topic: The most remarkable bugs you've encountered  (Read 7877 times)
qMopey
Level 6
*


View Profile WWW
« Reply #20 on: March 09, 2019, 12:14:59 AM »

Clearly the earth is just made of worms and that’s why we see shifting geometric patterns  Corny Laugh
Logged
RoKabiumGames
Level 0
**



View Profile WWW
« Reply #21 on: May 01, 2019, 10:16:57 AM »

The most "horriblest" of bugs I've had to deal with was with a language called ProIV.

It's a high level 4GL, and it allowed you to call other programs and pass in/out parameters.

If you passed in a literal to a program, it was possible that if the program you called tried to pass back a value to that literal then it actually changed the literal value!

So after that happened, the value of 1 would not equal '1' any more, and lines like this:

myint = 6 + 1;
if (myint == 7) { print "the answer is 7"; }

Would never be true.

This was a nightmare, since it changed the literal for the whole application, not just the current program, until you logged out and back in.

It was very hard to find this bug, and understand what was happening, and we spent many hours wasting time on this.

The best part, the owning company actually argued that it was a 'feature' and not a bug. Sad
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #22 on: May 01, 2019, 11:39:52 AM »

So there’s this compiler bug. It’s a proprietary compiler for some proprietary hardware. The bug is if you pass a pointer as the third parameter, the pointer isn’t properly sent through the call when compiled to assembly. Workarounds include padding the third parameter with an unused int, or pass in a struct pointer containing many other pointers, and ensuring the struct pointer isn’t the third parameter.

Silly workarounds instead of spending time fixing the compiler bug  Shrug
Logged
ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #23 on: May 14, 2019, 08:24:10 AM »

Here's a small one I just had: In some UI code, I was doing hit detection for a list of items by subtracting the click position from the top of the list, dividing by row height, and storing the result in an unsigned int. I relied on the value underflowing to detect when I had clicked above the top of the list; as in, assigning a negative number to an unsigned integer and expecting to get a large positive number. I just built my game for Android, and found that tapping above the list was acting like I was tapping the first row. It looks like on ARM, assigning a negative number to an unsigned int truncates to 0 instead of underflowing to a large value. Weird! Seems like this could cause a lot of subtle bugs, so I'll have to watch out for it in other places.
Logged

ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #24 on: May 14, 2019, 08:43:05 AM »

@qMopey
To be clear, were the workarounds provided by the compiler vendors? If so, that's some advanced avoidance to fix the actual issue.

@ThemsAllTook
Yeah, relying on underflow for that seems like not a great idea, no offense. using a signed int and checking for a negative value would achieve the same thing, no?

In any case, it's interesting that underflow truncates your number, but I bet the compiler happens to be doing that for you instead of it being a platform thing. It seems really weird to me that it would be impossible to get an underflow on ARM.
Logged

ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #25 on: May 14, 2019, 10:18:45 AM »

Yeah, relying on underflow for that seems like not a great idea, no offense.

If it's known reliable behavior, there's no reason not to use it. It either works or it doesn't. I find it important as a programmer to keep a clear picture of how my tools function and to use them as appropriate, intentionally avoiding dogmatic concepts like "probably shouldn't rely on this, but don't know why".

using a signed int and checking for a negative value would achieve the same thing, no?

Kinda. That's what I ended up doing. The reason it wouldn't be my first choice is because I'm comparing the row index with an unsigned value, which means 1) I have to explicitly cast when comparing to avoid compiler warnings, and 2) it opens me up to another category of problem due to cutting the positive numeric range in half. In practice, I'm obviously not going to have 2147483648+ rows in this list, but it's still important to keep in mind.

I'll have to dig into this deeper later today. I have a hunch that this is happening because I'm converting from float to unsigned int, and if I added an intermediate conversion via signed int, I'd still get the underflow. Both outcomes are perfectly reasonable; it's just interesting that ARM makes one decision while x86 makes the opposite one.
Logged

ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #26 on: May 14, 2019, 11:37:42 AM »

Well, I wasn't thinking about it in terms of dogmatic "shouldn't do this" principles haha, I just thought that checking for a large number after a subtraction sounds less exact than using a signed integer and checking if it's negative. Plus, yeah, converting between signed and unsigned is probably not a big issue if they represent a quantity of UI elements Wink

Do post again with your findings though, I kinda wanna know why the behavior is different too!
Logged

Schrompf
Level 9
****

C++ professional, game dev sparetime


View Profile WWW
« Reply #27 on: May 14, 2019, 10:58:29 PM »

My bet is on "Compiler optimisation exploiting some corner case or UB in the conversion chain". ARM is two-complements just like every other CPU architecture under the sun, the CPU definitely doesn't saturate a signed integer.

[edit] The more I think about it: maybe ARM64 has a specific FloatToUnsignedInt conversion instruction that x86 doesn't have. That one might indeed truncate.
Logged

Snake World, multiplayer worm eats stuff and grows DevLog
bateleur
Level 10
*****



View Profile
« Reply #28 on: May 14, 2019, 11:51:06 PM »

Only just discovered this thread, so time to share my all-time favourite bug from 35 years of programming!

The Winding Number Bug

A friend and I had programmed a multiplayer Bomberman variant on the Amiga. It was basically done and we were testing it, but we noticed that occasionally the game would crash. Trying to work out why, we systematically tested every feature and nothing crashed. We left the game running for hours and it didn't crash. We played it for hours... and it repeatedly crashed.

Eventually, I became aware of a pattern to the crashes. If I placed a bomb, then ran around the wall tile up and right of my position, when the bomb exploded the game would crash. So presumably if I just ran up and right one space that would do it? Nope, I had to run around the tile. Up-down-up-down? Nope. What if I went up-down-up-right-right (to make the distance the same)? Nope. How about around the block to the left? Yup, crash!

We looked at each other in confusion. The crash clearly depended on the winding number around the block. That is: it was measuring whether the character's path wrapped around the wall tile or not. But this was clearly impossible! Nothing in the code had the potential to calculate this, never mind crash after doing so!

Eventually we tracked down what was going on. The game used a kind of primitive homebrew object orientation to handle tile behaviour. The way it worked is that the code number of a tile was used to look up its behaviours. Due to a typo, one of the frames of the bomb flame animation had the same object code as the teleporter tile. The crash was being caused by a character moving onto the space above a bomb explosion flame. This was because the other end of the not-really-a-teleporter couldn't be found by the game. However, the character had to arrive at exactly the right frame for this to happen, because teleportation was only checked on arrival, not when standing still.

So why the winding number effect? Well, because of the way movement in the game worked, an experienced player could buffer each move a few frames ahead (while the animation for the last thing was completing). So, if you dropped a bomb then walked around the adjacent wall tile you could quite easily do it frame perfectly... and if you did you'd arrive on exactly the right frame to trigger the not-a-teleporter bug via the flames from the bomb you'd dropped. Mystery solved!
Logged

ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #29 on: November 29, 2020, 08:08:39 PM »

I just ran into a small gotcha that I think is worth posting. I've written a JSON serialization library that can handle all of the primitive types you'd expect in C. I just compiled it on a Raspberry Pi, and one of my unit tests mysteriously failed. The test had to do with the chosen representation for very large integers. Since the JSON spec designates that all number values are stored as double precision floating point, I had some special code that would detect whether a 64-bit integer being serialized would be unrepresentable in double format, and would write it as a string instead of a number in those cases. On the Raspberry Pi, a number that shouldn't have been representable wasn't getting stringified.

This made me realize that my implementation depends on CPU behavior. I'm just typecasting from double to uint64_t, then testing equality and using a string representation if the values are unequal after truncation. On all of the other CPUs I've compiled this code for, storing UINT64_MAX in a double caused a truncation, but on this CPU it was exactly representable. I'm not sure if this means it's not IEEE 754 compliant, or if it just interprets it differently. In any case, it's clear that what I need to do is pick a CPU-independent threshold value and stringify all 64-bit integers above that threshold, instead of relying on typecasting to tell me about representability. Otherwise, a number could hypothetically get encoded one way on one computer, then when decoded on another one that can't represent it, I'd end up with a different actual value.

Discovering edge cases like this is always fun, as long as they're fixable. Fortunately, I caught this one before it had the opportunity to cause mischief in any real-world data.
Logged

Borek
Level 0
*


View Profile
« Reply #30 on: December 02, 2020, 02:17:32 PM »

Years ago (somewhere in the late nineties) I've spent several days tracking a bug that appeared at completely random moments in a C++ code that was used for years and considered stable. In the end it turned out the code was bad from the start - I was deleting an object and then using a pointer from it to do some cleaning. As long as the environement was single threaded that wasn't producing problems, as the object - despite being deleted - was still in the memory.

Changing order of lines was all that was needed to make it work OK.
Logged
ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #31 on: November 27, 2022, 01:56:57 PM »

Here's a fun one I just fixed...

In an application where I can click and drag the mouse to rotate a 3D camera, I was noticing that every once in a while, starting a drag would cause a sudden jump in the camera's rotation, way more than it should have been from the amount that I was moving the mouse. Some experimentation revealed that I could make it happen more often if I was already moving the mouse before I pressed the button, though still only around 10% of the time. This was occurring both on Windows and Linux.

I had a hunch that this had something to do with setting mouse delta mode from a mouse down event. Since this is for 3D rotation, I want to be able to continue dragging infinitely in one direction without suddenly being unable to move further due to the mouse cursor hitting a screen edge, so when a drag gesture starts, I activate a mode that processes mouse events differently and returns motion deltas without actually moving the cursor. My current implementation of this on the two platforms where the problem occurs is a little bit janky - I hide the mouse cursor, warp the pointer location to the center of the window, then subtract the cursor position from the window center on every move event and warp it to the center again.

One thing this necessitates is to ignore the large delta of motion that's registered from the warp itself - otherwise, the large jump would happen every time mouse delta mode was activated or deactivated, depending on how close the cursor had been to the window center. My handling for this was to set an explicit screen location to ignore from an upcoming mouse event, so that when the motion event that was caused by warping the cursor came into the event queue, I'd just discard it. This did seem to work most of the time, so why was I still getting large deltas every once in a while?

I managed to confirm my hunch that the large deltas were coming from motion events that were already in the event queue before the cursor was warped, so when processing events in order, the position I was watching for to discard the event didn't come in until after I'd processed one with a different position. Since there's some variance in timing between when the operating system inserts mouse events into the event queue and when my run loop empties it, it would rarely have already inserted a non-delta-mode motion event into the queue after the mouse down event where I was activating delta mode.

The fix was just a small adjustment to when and how I discard events generated by pointer warping, but it was a bit of an adventure to get there. It also made me realize that there are some other issues with this approach - if I activate delta mode on a window whose center is offscreen, pointer warping doesn't work at all since I'm trying to warp to an offscreen location. A small window close to the screen edge will get a reduced range of motion in the direction of the edge, since the invisible cursor will hit it and be stopped. I should probably find a more direct way to measure mouse deltas than with pointer warping, but maybe I could do a band-aid fix by warping to the center of the screen instead of the center of the window...
Logged

Pages: 1 [2]
Print
Jump to:  

Theme orange-lt created by panic