Vasco.costa (Talk | contribs) (→Development Status) |
Vasco.costa (Talk | contribs) (→Week: 24-30 Aug) |
||
(37 intermediate revisions by 2 users not shown) | |||
Line 35: | Line 35: | ||
|- | |- | ||
|M5.3||OCL lighting modes: Phong, Diffuse, Surface Normals.||||'''TRUNK''' | |M5.3||OCL lighting modes: Phong, Diffuse, Surface Normals.||||'''TRUNK''' | ||
+ | |- | ||
+ | |M5.4||OCL lighting modes: Multi-hit transparent.||||'''TRUNK''' | ||
|- | |- | ||
|M6||TOR and TGC shot routines in OCL.||#393||'''TRUNK''' | |M6||TOR and TGC shot routines in OCL.||#393||'''TRUNK''' | ||
Line 57: | Line 59: | ||
--> | --> | ||
− | The ARB8, ARS | + | The ARB8, ARS, BOT, EHY, ELL, SPH, REC, TOR, TGC, shot routines are in SVN trunk. |
− | SVN trunk also contains solid database device storage and a render function which given a view2model matrix, width, height, can generate an RGB8 bitmap. Diffuse and Surface Normal light models are supported. The renderer does | + | SVN trunk also contains solid database device storage and a render function which given a view2model matrix, width, height, can generate an RGB8 bitmap. Diffuse and Surface Normal light models are supported. The renderer does BVH accelerated ray tracing and ignores the CSG operators. It is integrated as a render option in '''mged'''. |
=Development Phase= | =Development Phase= | ||
Line 214: | Line 216: | ||
* ''M2 commited to opencl branch: kludge up a simple rendering pipeline with grid spatial partitioning traversal acceleration.'' | * ''M2 commited to opencl branch: kludge up a simple rendering pipeline with grid spatial partitioning traversal acceleration.'' | ||
− | : The simple ANSI C rendering pipeline only supports Lambertian reflection with a stock grey material to make things simpler. | + | : The simple ANSI C rendering pipeline only supports Lambertian reflection with a stock grey material to make things simpler. Golliath scene: |
− | + | ||
− | + | ||
− | + | ||
<blockquote> | <blockquote> | ||
{| | {| | ||
Line 304: | Line 303: | ||
|[[File:Rt_ehyn.png|256px]]||[[File:Cl_ehyn.png|256px]]||[[File:Diff_ehyn.png|256px]] | |[[File:Rt_ehyn.png|256px]]||[[File:Cl_ehyn.png|256px]]||[[File:Diff_ehyn.png|256px]] | ||
|- | |- | ||
− | |align="center"|elapsed time @ 972x956: 0.35 sec||align="center"|elapsed time @ 972x956: 0. | + | |align="center"|elapsed time @ 972x956: 0.35 sec||align="center"|elapsed time @ 972x956: 0.06 sec|| |
|} | |} | ||
</blockquote> | </blockquote> | ||
Line 334: | Line 333: | ||
|align="center"|1 million triangles | |align="center"|1 million triangles | ||
|- | |- | ||
− | |align="center"|elapsed time @ 972x956: '''0. | + | |align="center"|elapsed time @ 972x956: '''0.14 sec''' (OCL) |
|- | |- | ||
|align="center"|elapsed time @ 972x956: 17.49 sec (RT) | |align="center"|elapsed time @ 972x956: 17.49 sec (RT) | ||
Line 342: | Line 341: | ||
</blockquote> | </blockquote> | ||
: All math operations are done in double precision FP. | : All math operations are done in double precision FP. | ||
+ | * Fix bugs in bot triangle data parsing. | ||
+ | * Add gamma correction and haze. | ||
+ | * Fix a bug in hlbvh construction in certain edge cases were the primitive bounding boxes are empty. | ||
+ | * Experimental bot triangle normal support. | ||
+ | * Phong shading lighting model. | ||
+ | * Handle UNORDERED, CW, and CCW triangle vertices to fix bot normal generation. | ||
+ | |||
+ | * Added material colors to OCL render. The colors are kind of buggy because there is no easy way, that I know of, getting the actual material associated with a solid in the table. The materials are in regions and regions are the ones with materials. Any solid may be in a number of regions. Figuring out the material without consulting the actual CSG tree which has the regions is hence non-trivial. | ||
+ | |||
+ | * Added a lightmodel with transparent multi-hit rendering to show the multi-hit facilities. | ||
+ | <blockquote> | ||
+ | {| | ||
+ | !'''Golliath (OCL)''' | ||
+ | |- | ||
+ | |[[File:Cl_golliath.png|512px]] | ||
+ | |- | ||
+ | |align="center"|elapsed time @ 972x956: 0.33 sec | ||
+ | |} | ||
+ | |||
+ | * Fix linking errors in AMD OCL SDK. | ||
+ | * Fix issues with OCL color render. | ||
+ | * Fix issue when doing a render with nothing on view. | ||
+ | * Set the local workgroup size when rendering to use subgrids up to 8x8 size to maximize coherency of accesses. speeds up things like 2x. | ||
+ | |||
+ | * Tested an adaptation of ''Understanding the Efficiency of Ray Traversal on GPUs. Timo Aila and Samuli Laine, Proc. High-Performance Graphics 2009.'' Was not significantly better on the GTX TITAN compared with just shooting rays in 8x8 blocks. You can read more about it here: | ||
+ | ** https://sourceforge.net/p/brlcad/patches/416/ | ||
+ | |||
+ | =Post Development Phase= | ||
+ | === Week: 24-30 Aug === | ||
+ | * Use less memory to store solid ids and materials. Eliminate some more branches and simplify logic in solver. | ||
+ | * Compute transparency using attenuation. | ||
+ | |||
+ | * bool.c cleanups. If we ever are to port the standard BRL-CAD CSG evaluator algorithm to OpenCL C, given that there seem to be no other major viable options which give sufficiently correct results for our project's purposes, this code must be brought to heel. Such a task would be immense. I hope I helped this with a series of patches to: remove <code>goto</code> (not available in OpenCL C), to re-compile the bool trees (binary tree of pointers) to a linear postfix array form. This form is easier to parse and eval during the rendering stage. I did those tasks in these stages: | ||
+ | **eliminated all gotos in <code>rt_default_multioverlap()</code>. | ||
+ | **eliminated all gotos in <code>rt_boolweave()</code>. | ||
+ | **produced a patch to use the postfix linear tree. I uses a lot less memory (64-bits per node) and the traversal is more cache coherent. The CSG inference engine supports these operators: UNION, INTERSECT, DIFFERENCE, XOR, NOT, SOLID, NOP. | ||
+ | ::It might require re-interfacing with db code in particular for the way XOR operations used to be treated. I reimplemented these functions to use the postfix bool tree: | ||
+ | ::<code>rt_tree_max_raynum()</code>, <code>rt_tree_test_ready()</code>, <code>rt_booleval()</code>, <code>rt_solid_bitfinder()</code>. | ||
+ | ::*https://sourceforge.net/p/brlcad/patches/417/ | ||
+ | |||
+ | * Process segments instead of hit points. Use registers to store segments. Make all available rendering modes (full, diffuse, normals, multi-hit transparent) work in a single pass. This speeds up the full and transparent modes like 2-3x. | ||
+ | * Also updated the multiple-kernel launch renderer code to work with the segment list approach. It might be slower than the single-kernel launch renderer but we might eventually need the whole segment list in memory at the same time to perform more advanced rendering. | ||
+ | * Fixed the ocl material colors. It seems a solid's basic material color is in the end rather than the beginning of the regions list it has... | ||
+ | |||
+ | |||
+ | * Well folks GSoC 2015 is finally over! Mission complete! I thank everyone who made this possible: | ||
+ | **Google: Carol Smith | ||
+ | **BRL-CAD: brlcad (Sean), Stragus, ``Erik, starseeker. | ||
+ | These were the most notable task supporters to list. The deepest thanks go to my parents for tirelessly supporting me during this code marathon. |
I made two patches for OpenCL (OCL) shot code. One patch refactors the existing SPH (Sphere) shot code, and the another patch implements EHY (Elliptical Hyperboloid) shot code.
Milestone | Description | Patch | Status |
---|---|---|---|
M0.1 | fix OCL SPH shot routine compilation errors. | #341 | TRUNK |
M0.2 | EHY shot routine in OCL. | #346 | TRUNK |
M1 | ELL and ARB8 shot routines in OCL. | #370 | TRUNK |
M2 | see M5 | ||
M3.0 | #379 | see M3.2 | |
M3.1 | #379 | see M3.2 | |
M3.2 | HLBVH object partitioning builder in C. traversal in OCL. | TRUNK | |
M4 | GPU side database storage of OCL implemented primitives. | #392 | TRUNK |
M5 | port compute intensive or critical parts of the dispatcher, |
TRUNK | |
M5.1 | OCL dispatcher that performs the shot routines for a whole frame. | TRUNK | |
M5.2 | OCL rasterizer that does the pixel pushing for a whole frame. | TRUNK | |
M5.3 | OCL lighting modes: Phong, Diffuse, Surface Normals. | TRUNK | |
M5.4 | OCL lighting modes: Multi-hit transparent. | TRUNK | |
M6 | TOR and TGC shot routines in OCL. | #393 | TRUNK |
M6.1 | REC shot routine in OCL. | TRUNK | |
M6.2 | Surface normal routines for all seven OCL implemented primitives. | TRUNK | |
M7 | BOT shot routine in OCL. | - | |
M7.1 | Simple BOT shot routine in OCL that computes triangle hits and normals brute force. | TRUNK | |
M7.2 | CPU HLBVH BOT shot construction with OCL traversal and interpolated per pixel normals. | TRUNK |
The ARB8, ARS, BOT, EHY, ELL, SPH, REC, TOR, TGC, shot routines are in SVN trunk.
SVN trunk also contains solid database device storage and a render function which given a view2model matrix, width, height, can generate an RGB8 bitmap. Diffuse and Surface Normal light models are supported. The renderer does BVH accelerated ray tracing and ignores the CSG operators. It is integrated as a render option in mged.
do_frame() → do_run() → worker()* → do_pixel()* → rt_shootray()* → rt_*_shot()
rt_prep_parallel() → rt_cut_it() → rt_nugrid_cut()
# GPU execution GEN_RAYS(args) lengths = COMPUTE_LEN_SEGMENTS(rays, db) segs = ALLOC_SEGMENTS(lengths) segs = COMPUTE_SEGMENTS(rays, db, segs) # CPU execution waiting_segs = READ_SEGMENTS(segs) # merge with CPU computed segments for non-accelerated primitives finished_segs = RT_BOOL_WEAVE(waiting_segs) partitions = RT_BOOL_FINAL(finished_segs) pixels = VIEWSHADE(rays, db, partitions)
# e.g. test for tgc make tgc tgc e tgc ; rt -o rt_tgc.pix # OR e tgc ; rt -o cl_tgc.pix pixdiff rt_tgc.pix cl_tgc.pix | pix-fb
# test results: arb8: pixdiff bytes: 777500 matching, 8932 off by 1, 0 off by many ehy: pixdiff bytes: 760977 matching, 25443 off by 1, 12 off by many ell: pixdiff bytes: 764588 matching, 21844 off by 1, 0 off by many sph: pixdiff bytes: 736942 matching, 49490 off by 1, 0 off by many tgc: pixdiff bytes: 783191 matching, 3241 off by 1, 0 off by many tor: pixdiff bytes: 774138 matching, 12294 off by 1, 0 off by many
RT (EHY) | OCL (EHY) | PIXDIFF (EHY) |
---|---|---|
![]() |
![]() |
![]() |
asm("sqrt.rp.f64 %0, %1;" : "=d"(b) : "d"(a));
. OCL 1.1 and over have no support for setting rounding modes without using inline assembly.
arb8 160 ell 2 pipe 10 tgc 94 tor 22
OCL Sphere (Surface Normals) OCL Sphere (Diffuse) OCL Havoc (Surface Normals) ![]()
elapsed time: 0.05 sec elapsed time: 0.05 sec elapsed time: 4.20 sec
RT Hyperboloid OCL Hyperboloid PIXDIFF Hyperboloid ![]()
elapsed time @ 972x956: 0.35 sec elapsed time @ 972x956: 0.06 sec
ehy: pixdiff bytes: 760757 matching, 25663 off by 1, 12 off by many
. I got similar results. So the pixel engine shouldn't be more innacurate than the regular one. What I did find out in surface normals mode was that the CPU code actually is showing hits with the side of the hyperboloid (see the blue dots in the figure at the left). Despite this view being top down. So maybe the GPU version is actually more accurate? The differences show a nice noisy pattern without obvious banding or moire so there don't seem to be any major issues with the hits, normals, and raster.
-z
OpenCL command line option when running rt -h
.
Buddha (OCL) ![]()
1 million triangles elapsed time @ 972x956: 0.14 sec (OCL) elapsed time @ 972x956: 17.49 sec (RT) elapsed time @ 972x956: 0.49 sec (RT bot kd-tree)
Golliath (OCL) ![]()
elapsed time @ 972x956: 0.33 sec
- Fix linking errors in AMD OCL SDK.
- Fix issues with OCL color render.
- Fix issue when doing a render with nothing on view.
- Set the local workgroup size when rendering to use subgrids up to 8x8 size to maximize coherency of accesses. speeds up things like 2x.
- Tested an adaptation of Understanding the Efficiency of Ray Traversal on GPUs. Timo Aila and Samuli Laine, Proc. High-Performance Graphics 2009. Was not significantly better on the GTX TITAN compared with just shooting rays in 8x8 blocks. You can read more about it here:
Post Development Phase[edit]
Week: 24-30 Aug[edit]
- Use less memory to store solid ids and materials. Eliminate some more branches and simplify logic in solver.
- Compute transparency using attenuation.
- bool.c cleanups. If we ever are to port the standard BRL-CAD CSG evaluator algorithm to OpenCL C, given that there seem to be no other major viable options which give sufficiently correct results for our project's purposes, this code must be brought to heel. Such a task would be immense. I hope I helped this with a series of patches to: remove
goto
(not available in OpenCL C), to re-compile the bool trees (binary tree of pointers) to a linear postfix array form. This form is easier to parse and eval during the rendering stage. I did those tasks in these stages:
- eliminated all gotos in
rt_default_multioverlap()
.- eliminated all gotos in
rt_boolweave()
.- produced a patch to use the postfix linear tree. I uses a lot less memory (64-bits per node) and the traversal is more cache coherent. The CSG inference engine supports these operators: UNION, INTERSECT, DIFFERENCE, XOR, NOT, SOLID, NOP.
- It might require re-interfacing with db code in particular for the way XOR operations used to be treated. I reimplemented these functions to use the postfix bool tree:
rt_tree_max_raynum()
,rt_tree_test_ready()
,rt_booleval()
,rt_solid_bitfinder()
.
- Process segments instead of hit points. Use registers to store segments. Make all available rendering modes (full, diffuse, normals, multi-hit transparent) work in a single pass. This speeds up the full and transparent modes like 2-3x.
- Also updated the multiple-kernel launch renderer code to work with the segment list approach. It might be slower than the single-kernel launch renderer but we might eventually need the whole segment list in memory at the same time to perform more advanced rendering.
- Fixed the ocl material colors. It seems a solid's basic material color is in the end rather than the beginning of the regions list it has...
These were the most notable task supporters to list. The deepest thanks go to my parents for tirelessly supporting me during this code marathon.
- Well folks GSoC 2015 is finally over! Mission complete! I thank everyone who made this possible:
- Google: Carol Smith
- BRL-CAD: brlcad (Sean), Stragus, ``Erik, starseeker.