bazze opened this issue on Jan 12, 2007 · 84 posts
kawecki posted Sun, 21 January 2007 at 12:19 PM
The story of a Bug and Microsoft.
More than ten years ago I had no idea about 3D, as I always had interest in the theme I downloaded a 3D rendering library from Intel. The library was a pack of DLLs and application examples for Windows 3.1, it was well before any Windows 95, DirectX or OpenGL.
In the package were included some applications and the source code of the applications. Some examples were amazing, you were able to rotate the famous teapot, change its material, apply textures or environment maps and load other models. As I knew nothing about the theme the first step of learning something is to compile the source code and see if it works the same as the demo.
As I use Borland C compiler, I had to change a little the sintaxis of the code and after some work I had my compiled demo running. The next step should be to understand the code, introduce own alterations and so on, but before this the compiled version and the original version must run in the same way, if not you have no refference where a possible error can be.
I played with my compiled version and it worked fine, but in some particular case it crashed. I opened the original demo, run the particular case and no crashes in the demo. Repeated the test many times and always the same story, why version crash and Intel's do not.
I must have done something wrong with the code, so my version was not the same as the original.
I looked at my code and it was exactly the same in the point of the crash, the error was a floating point error. Once unable to find the error, I had to use the debug and run instruction by instruction to find the point where the error happens. I find the point, but no information about the source of the error. The error was that some floating point variable has an undefined value and crashed in the moment when was used. The question was why the variable had a wrong value?
I spend a lot of time trying to find where the variable came wrong and reached the conclussion that a function call to some Intel's DLL returned the wrong illegal value. The DLL was doing something not to be done!
As I only had the code of the apllication and no code of the DLL, I began to disassemble the DLL. After a lot of work I had the source code of the function responsable for the error and it was a bug there, Intel forgot one POP instruction in the subroutine return.
I tweaked a little the hex code of the DLL correcting the bug and apllication run fine without any crash.
Well, I found the bug and corrected the error, do you think that is the end of the story?, no it's jsut the begining of another biger one!
The question is easy, Intel DLL has a bug, but why the bug makes crash only my code and not their original code?????
I supposed that the source code of the package was not the same as the used in their demo. Not always the source code that comes together with the executable are the same, many times the executable is compiled with a newer version of the source code. So it must be something different in the demo exe that make it not crash.
Even I had the source code I had to disassemble the demo to try to find where was the difference. After a lot of work I had the demo source code, compared it to my source code and amazing, it was equal!!!!!!
Something impossible is hapenning, I and Intel used the same code, but my code crash and Intels do not!, a nightmare!!!!
Both executable codes cannot be the same, one crash and the other don't, as the source codes are the same there must exist some difference of how is generated the executable. I compared the executables and are exactly the same in the part of the program, so the difference must reside somewhere outside the demo program.
I compared the whole exe and there were some differences, the reason of the difference was that I used Borland's compiler and Intel used Microsoft's compiler.
As the part of the code is the same the difference must reside in the prologue added by the compiler before the code start to run.
After hard work and a lot of more dissamblies I found it!!!!
The difference was in how Microsoft and Borland deal with exceptions (something that is wrong and can make the computer malfunction).
Intel DLL had a bug that in some cases a floating point NAN value was returned (an illegal value), it is a serious error because a floating point operation never can return this value and any attempt to use this value will cause an invalid operation of the floating point processor (something as try to divide by zero, or zero divide by zero, square root of a negative number).
Once the buggy DLL returned a wrong and illegal value the next time it was tried to be used the floating point unit raised an exception that started the execution of an exception routine (something must be done, the program did something not to be done) and the response of Borland to this error was quite different to MIcrosoft.
Borland compiler is very strict, it don't let pass errors, when an error happens it aborts the program.
Microsoft compiler on the other side cared very little of errors, so in this case this error was ignored, the program was not aborted and let the program continue working with a wrong result.
Well, a wrong pixel doesn't cause any harm, but if it is a wrong sum of your bank account????
Stupidity also evolves!