Editing User:Marco-domingues/GSoC17/Log


Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 229: Line 229:
=== 25 July ===
* Did some further testing over some scenes in 'share/db' to gather more statistics to optimize the code
=== 26 July ===
* Fixed the issue that was causing some interior normals to be wrongly represented
* Working on a solution to iterate only over partitions evaluated in the shading process
=== 27 July ===
* Optimized the process of building the regiontable by precomputing a table with all the regions involved in each primitive instead of doing this for each partition in the rt_boolfinal kernel.
* This change on the code resulted in a significant improvement on the performance, as you can see in the following table:
(Running the OpenCL over the NVIDIA OpenCL SDK on GPU - Debug build)
=== 31 July ===
* Changed the code to skip unevaluated partitions during the shading process. This change doesn't appear to have much impact over the performance of the code (2.58sec before vs 2.52sec now) for the havoc.g scene.
* The major bottleneck on the code right now seems to be the bool_eval() function. Disabling the function results in rendering the havoc scene in only 0.35sec (vs 2.52sec function enabled), for what it's worth.
=== 1 August ===
* Found and fixed the bug that was causing some scenes to shade the wrong partitions in the ray. The closest partition in the ray is being shaded now.
* Example of what was happening before vs what is happening now:
=== 2 - 4 August ===
* Working on a linearized binary tree representation to perform bool_eval() in OpenCL in a similar way of the current code in the trunk, instead of using the boolean tree in RPN.
=== 6 August ===
* Changing the bool_eval() and other auxiliary functions in order to use the new tree structure during boolean evaluation.
=== 7 August ===
* Finished the implementation of bool_eval() using the new tree representation. The new code seems to be slightly slower than the version on the trunk for some scenes, but still considerable faster than the previous method that uses the RPN tree. Thus, porting the new code to OpenCL should improve the performance of the code, which currently uses the RPN tree method.
* Here is a table with a time comparison between the code in the trunk, the code with the new tree structure and the code currently in the opencl branch (RPN tree):
(Release build)
* Removed the 'next_evalpp' from the partition structure, which wasn't necessary in the first place
=== 8 August ===
* Made some optimizations in the ANSI C bool_eval() prototype and now the new code is slightly faster than the code currently in the trunk! This difference is more noticeable when rendering the havoc scene with the command "rt -s2048": 2.53sec vs 2.18sec.
(Release build)
=== 9 August ===
* Manually merged the new code over the opencl branch code.
* Changed the function 'rt_pr_bit_tree()', used to debug the new tree structure, so the output matched the output from the union tree debug function. (before the function was printing newlines out of place)
* Updated some structures and some host functions to create the new OpenCL buffer with the new tree representation.
* Planning to finish the bool_eval() function of the OpenCL code tomorrow.
=== 10 August ===
* Finished the port of the new_bool() function to OpenCL. This new bool_eval() function uses a new boolean tree representation, and follows the behaviour of the current ANSI C code in the trunk.
* Here is a table with the time comparison of the previous implementation (bool_eval() with RPN tree) and the current version with the new tree representation:
(running the OpenCL over the Intel OpenCL SDK on CPU - Release build)
=== 11 August ===
* Cleaning the code to prepare for commit.
* General testing over the ANSI C boolean evaluation and the OpenCL boolean evaluation, trying to identify where the bottlenecks are located.
* Committed the new code to perform boolean evaluation in the opencl branch code (https://sourceforge.net/p/brlcad/code/70074/)
=== 14 August ===
* Discussed with my mentor, Vasco, the plan for the next weeks via skype.
* Will be cleaning the current code and prepare a patch ticket against the trunk with the new CSG boolean evaluation in OpenCL.
* Next will start changing the rendering loop of the OpenCL code to follow the behaviour of the ANCI C code, where boolean evaluation is performed in a parcial fashion.
=== 15 August ===
* Cleaning and refactoring the code
* Preparing patch against trunk code
=== 17 August ===
* Submitted patch with code to perform boolean evaluation of CSG with OpenCL (https://sourceforge.net/p/brlcad/patches/474/)
* Started working on new rendering loop, where the 'store_segs', the 'rt_boolweave' and 'rt_boolfinal' kernels will be merged into a single kernel, so the weave of segments and evaluation of partitions can take place as soon as new segments are created.
=== 18 August ===
* Merged the 'store_segs' and 'rt_boolweave' kernels into a new kernel. Still trying to avoid repeating the weave of segments already weaved in the ray.
* Planning to add the 'rt_boolfinal' kernel next to follow the behaviour of the ANSI C code.
=== 21 - 25 August ===
* Changes on 'rt_boolfinal' function in order to add this kernel into the single kernel (rt_shootray kernel: store_segs + rt_boolweave + rt_boolfinal).
* Fixed a bug that caused the ray tracing to crash with this new system, by storing the index of the head of partitions for the current ray and passing it by argument to the rt_boolweave function.
* Trying to fix the problem of some partitions being evaluated too early, that caused some pixels to shade the incorrect partitions.
* In the end, I couldn't figure out a way to weave and evaluate partitions in a partial way using the BVH (bounding volume hierarchy) because the BVH nodes aren't in spatial order. This optimization was promising and it should work if we change the code to store the nodes in a spatial subdivision structure like the kd-tree, as the Ansi C code does.
* Since we are processing all the hits before weaving segments and evaluating partitions, there is no need to keep evaluating partitions after the first opaque partition in the ray is evaluated. Because the partitions are already ordered by its in_hit point, and because all segments of the ray are processed, there is no possibility to have a partition closer to the ray origin after evaluating the first partition, so it is unnecessary and expensive to keep evaluating partitions for the ray.
* By stopping the evaluation of partitions after the first partition evaluated of the ray is found, the performance of the OpenCl code increased significantly, as we can see in the following table:
(running the OpenCL over the Intel OpenCL SDK on CPU - Release build)
=== GSoC17 is Over!! ===
* Google Summer of Code 2017 comes to an end! It was an amazing experience and I couldn't be happier with this first introduction to open source software development!
* I would like to thank the BRL-CAD community for giving me this opportunity and for always being available to help!
* A special thanks to Vasco Costa, for the great mentoring and guidance through the summer!!
* Here is the work product link that I submitted: https://github.com/MarcoSDomingues/GSoC17

Please note that all contributions to BRL-CAD may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see BRL-CAD:Copyrights for details). Do not submit copyrighted work without permission!

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)