Determine why solids.sh fails on 64-bit

Determine why solids.sh fails on 64-bitBRL-CAD

Status: ClosedTime to complete: 72 hrs Mentors: SeanTags: C, bug, regression test,

BRL-CAD has a regression test script called solids.sh that creates a bunch of primitives, renders an image of those primitives, and then compares that image to a reference image. On (most?) 64-bit platforms, the test is off by several RGB values for exactly 3 pixels.

This task involves figuring out why, exactly, this is occurring. It may be helpful to compare intermediate computation results from a 32-bit environment to see where the computations diverge, however slightly. Ultimately, the goal is to identify the cause and a recommended course of action to fix the divergence problem.

Code:

regress/solids.sh
src/librt
src/liboptical

Uploaded Work

File name/URL	File size	Date submitted
vdot1.log	392 bytes	January 08 2013 23:58 UTC
vdot2.log	860 bytes	January 08 2013 23:58 UTC

Comments

Cezaron January 5 2013 16:37 UTCTask Claimed

I would like to work on this task.

Harmanpreet Singh on January 5 2013 16:40 UTCTask Assigned

This task has been assigned to Cezar. You have 72 hours to complete this task, good luck!

Cezaron January 5 2013 18:42 UTC64 or 32?

Are you sure it fails on 64-bit? It passes on a Mac (64-bit) and a 64-bit Mint Linux, but fails on the VM (32-bit) and 32-bit Mint Linux, with "3 off by many". Seems to me like 32-bit is the problem.

Cezaron January 5 2013 22:03 UTCCause

The bug seems to be here, in regress/tgms/solids.mged. If I remove these three lines, I get the same number of "off by many" on both 64- and 32-bit. I guess I'll have to squint at librt/primitives/eto/eto.c for a while to fix this.

in eto.s eto 64 0 32 0 0 1 12 0 12 24 4

r eto.r u eto.s

mater eto.r "plastic {sp .4 di .9}" 155 155 255 0

Sean on January 6 2013 15:03 UTCcould be

It's entirely likely that some platform presented the 3 off-by-many pixels in 64 vs 32 bit and you observe the opposite in your setup. It's possible that it was just recorded wrong too, but if you have it reproduced in 32-bit, that's certainly good enough to debug.

Even without understanding the math, it may help to find a specific ray that gives the different values. Then you can just shoot that single ray (via rt -Q or nirt) and follow the two paths in a debugger. The -X debug flags may also be helpful (will give a bunch of printing statements).

Cezaron January 6 2013 21:38 UTCBig logs

I've run rt with the -X flag and I get one 120 MB file for each platform. What I've noticed is that the correct one has three more lines than the 32-bit one, and I don't think this is a coincidence. I think that if I find those three lines, I'll know where the problem comes from, only they are very big (2.5 mil) and I don't know what I should be looking for. Another thing I've noticed is this difference related to a eto.s, the last number is one value less in the 32-bit version. I'm not sure if such differences are expected, but I think they are relevant.

color (0.0254081, 0.0254081, 0.0418004)

color (0.0254081, 0.0254081, 0.0418003)

Cezaron January 6 2013 22:16 UTCFound the lines

I found the lines that are missing.

840785: shade_inputs(particle.s) flip N xy=180, 149 ID_PARTICLE surf=2 dot=0.00685944

840797: shade_inputs(particle.s) flip N xy=181, 149 ID_PARTICLE surf=2 dot=0.0489023

846419: shade_inputs(particle.s) flip N xy=180, 150 ID_PARTICLE surf=2 dot=0.017696

Sean on January 8 2013 09:36 UTCthat's gold

So that's certainly the high-level "cause" -- the shade_inputs message is saying that a ray was returned with a normal facing away from the camera. The ray is probably nearly perpendicular to the camera as the ray barely grazes a surface/edge.

The dot products are clearly very "near zero" so it could be an issue of not properly testing the ray we're returning or the normal being computed for the particle object.

FYI, the differences in floating point are expected. Single precision floating point falls apart around 7 digits after the decimal point so some variation can be expected. As long as the values are below 0.0005, our calculations should be stable. That's our guarantee.

Sean on January 8 2013 09:39 UTCgood to go

This will be good enough to close on already, but if you can identify the rays that preceeded the three flipped normals -- and their values for the good and bad renderings -- that would help pinpoint the problem more specifically.

Sean on January 8 2013 09:39 UTCDeadline extended

The deadline of the task has been extended with 1 days and 0 hours.

Cezaron January 8 2013 23:58 UTCx, y, and z

I logged the x, y, and z of the arguments passed to VDOT in shade_inputs and extracted the differences in those three places. vdot1.log is the correct one.

Cezaron January 8 2013 23:58 UTCReady for review

The work on this task is ready to be reviewed.

Sean on January 9 2013 02:42 UTCTask Closed

Congratulations, this task has been completed successfully.