Community Bonding Period

I made two patches for OpenCL (OCL) shot code. One patch refactors the existing SPH (Sphere) shot code, and the another patch implements EHY (Elliptical Hyperboloid) shot code.

Background Research

Development Status

Milestone	Description	Patch	Status
M0.1	fix OCL SPH shot routine compilation errors.	#341	TRUNK
M0.2	EHY shot routine in OCL.	#346	TRUNK

M1	ELL and ARB8 shot routines in OCL.	#370	TRUNK
M2	refactor dispatcher, shoot, optical renderer to process many rays in parallel in C when rendering an image or block.		BRANCH
M3	grid spatial partitioning in OCL.	#379	DONE
M4	GPU side database storage of OCL implemented primitives SPH. EHY, ELL, ARB8.	#392	TRUNK
M5	port compute intensive or critical parts of the dispatcher, boolean evaluation, optical renderer to OCL.		?
M6	TOR and TGC shot routines in OCL.	#393	TRUNK
M7	BOT shot routine in OCL.		CANCELLED

The ARB8, EHY, ELL, SPH, TOR, TGC, shot routines are in SVN trunk.

Development Phase

Week 1 : 25-31 May

Created some example .g files in mged for the primitives to be implemented this week. The Quick Reference Card proved to be quite useful.
Do the matrix ops for EHY (Elliptical Hyperboloid) in the OCL side.
- https://sourceforge.net/p/brlcad/patches/346/

Made patch for ELL (Generalized Ellipsoid) and ARB8 (Arbitrary Polyhedron) OCL shots.
M1 complete: ELL, ARB8 shot routines in OCL.
- https://sourceforge.net/p/brlcad/patches/370/

Tried out a bunch of code browsing tools (cscope, LXR, doxygen, etc). The NetBeans IDE seems the most promising.

Week 2 : 1-7 Jun

Read code to better understand the main rendering loop. It seems to be something like this:

do_frame() → do_run() → worker()* → do_pixel()* → rt_shootray()* → rt_*_shot()

The code is recursive (which is problematic for OCL). I'll work on a simplified version of the rendering loop which only does the primary rays in C as a first approach. After I get the non-recursive parallel friendly C code I'll work on the OCL port.

Updated project proposal on Google Melange.

SVN r65153 fails to compile with a bogus error of an unused variable that's actually being used its just that GCC 4.9.1 is too dumb to figure that out.
- https://sourceforge.net/p/brlcad/bugs/365/

Upgraded Ubuntu and GCC.

Made simple ray generation code in C.
Made simple frame buffer write code in C.
Made simple diffuse shading code in C.

Week 3 : 8-14 Jun

Added the main boolean weaving code to our minimal renderer.
Eliminated some gotos and made the code more thread safe.
The simple renderer patches are in the mailing-list.

Week 4 : 15-21 Jun

Added OpenMP compile support. Use OpenMP constructs to launch the rendering threads. This work still has some bugs in it.
Alpha M2 patch. in mailing-list.

Read code to better understand the main spatial partition construction routines. They seem to be something like this:

rt_prep_parallel() → rt_cut_it() → rt_nugrid_cut()

We need something less complex that is more amenable to porting to OCL. So I will be implementing the Lagae & Dutré compact grid construction algorithm published at EGSR. First I will program in ANSI C then I will port the code to OpenCL.

Started work on M3: grid spatial partitioning in OCL.
ANSI C Lagae & Dutré grid construction code.

Week 5 : 22-28 Jun

Took time off from the project to go to the CGI'15 conference.

Week 6 : 29 Jun-5 Jul

GSoC Midterm Evaluations.

Weeks 7-8 : 6 Jul-12 Jul, 13 Jul-19 Jul

Evaluating algorithms for grid construction.
Selecting OCL kernels we can use to support grid construction. It seems PyOpenCL has some kernels we could use. Now the question is how to extricate the OpenCL/C from the Python...

ANSI C grid traversal code.
OCL grid construction on the GPU.
M3 complete: grid spatial partitioning in OCL.
- https://sourceforge.net/p/brlcad/patches/379/

Weeks 9-10 : 20 Jul-26 Jul, 27 Jul-2 Aug

Implemented GPU side solid database storage infrastructure.
The OCL EHY shot code now uses the GPU solid database instead of creating the input buffers on every call.

The code allows the primitive to decide how it is stored without imposing a convention. So one can use SoA, AoS, or whatever to store the data.

Implemented GPU side solid database storage for SPH, ELL, ARB8.
Extracted out some duplicated OCL code.

M4 complete: GPU side database storage of OCL implemented primitives.
- https://sourceforge.net/p/brlcad/patches/392/

OCL TOR (Torus) shot routine. Includes the higher order equation solver code.
OCL TGC (Truncated General Cone) shot routine.
Put equation solver in separate .cl file.

M6 complete: TOR and TGC shot routines in OCL.
- https://sourceforge.net/p/brlcad/patches/393/

General overhaul and cleanup of the OCL shot patches.

Upgraded NVIDIA OpenCL drivers on my computer to CUDA 7.0.

I drew up a tentative algorithm for the CSG raytrace:

# GPU execution
GEN_RAYS(args)
lengths = COMPUTE_LEN_SEGMENTS(rays, db)
segs = ALLOC_SEGMENTS(lengths)
segs = COMPUTE_SEGMENTS(rays, db, segs)

# CPU execution
waiting_segs = READ_SEGMENTS(segs)
# merge with CPU computed segments for non-accelerated primitives
finished_segs = RT_BOOL_WEAVE(waiting_segs)
partitions = RT_BOOL_FINAL(finished_segs)
pixels = VIEWSHADE(rays, db, partitions)

This allows us to ultimately reuse the CPU code for boolean weaving, primitive normals, shaders, to have a 100% pixel accurate result. At the expense of a lot of memory traffic and CPU-side computation of some fairly maths intensive parts like the normal compute and shade. However I presently see no other way of having a 100% accurate result in the time we have available.

Made OCL EHY shot code look exactly like the ANSI C version. Cleanups.

Ran tests on OCL shot code to check for accuracy vs existing code:

# e.g. test for tgc
make tgc tgc
e tgc ; rt -o rt_tgc.pix # OR e tgc ; rt -o cl_tgc.pix
pixdiff rt_tgc.pix cl_tgc.pix | pix-fb

# test results:
arb8: pixdiff bytes:  777500 matching,    8932 off by 1,       0 off by many
ehy:  pixdiff bytes:  760977 matching,   25443 off by 1,      12 off by many
ell:  pixdiff bytes:  764588 matching,   21844 off by 1,       0 off by many
sph:  pixdiff bytes:  736942 matching,   49490 off by 1,       0 off by many
tgc:  pixdiff bytes:  783191 matching,    3241 off by 1,       0 off by many
tor:  pixdiff bytes:  774138 matching,   12294 off by 1,       0 off by many

RT (EHY)	OCL (EHY)	PIXDIFF (EHY)

The off by many problem with EHY is probably related to rounding errors with sqrt in OCL for NVIDIA using a different rounding mode (RTE) than X86 (RTP). I tried to use PTX assembly, i.e. asm("sqrt.rp.f64 %0, %1;" : "=r"(disc) : "r"(disc));, to solve it but no dice. The code won't run. OCL 1.1 and over have no support for setting rounding modes without using inline assembly.

Week 11 : 3 Aug-9 Aug

Sean applied patches #341 and #346 to SVN trunk.

Got SVN write access.
Applied patches #393 and #370 to SVN trunk.
SVN trunk now has OCL shot evaluation for SPH, EHY, ELL, ARB8, TOR, TGC primitives.
Refactored SPH (remove duplicate code, etc) and applied it to trunk.
Move declarations to top level in order to eliminate duplicate code in trunk.

Pass struct with primitive data to OCL as an initial step to an AoS device primitive database. Move constants into common.cl.
Generic OCL solid shot handler. Refactored code to remove duplicates.
Load large OCL vectors on demand to reduce stack footprint per function call.
Fix memory leak on OCL loaded program source code.

Week 12 : 10 Aug-16 Aug

Add inclusive and exclusive scan OCL code from PyOpenCL to trunk.

Created a private branch for opencl.

M2 commited to opencl branch: kludge up a simple rendering pipeline with grid spatial partitioning traversal acceleration.

The simple ANSI C rendering pipeline only supports Lambertian reflection with a stock grey material to make things simpler. Example output for goliath.g:

arb8 160

ell 2

pipe 10

tgc 94

tor 22

We have OCL shot tests for 278 of 288 solids. Only primitive in this database which is not ported yet is pipe.

For future reference I get these timings for the above scene (one OCL kernel invocation per ray-primitive shot):

SHOT: cpu = 421.568 sec, elapsed = 447.675 sec

M4 commited to opencl branch: add device side solid database storage.

Anonymous

Search

User:Vasco.costa/GSoC15/logs

Namespaces

More

Page actions

Contents

Community Bonding Period

Background Research

Development Status

Development Phase

Week 1 : 25-31 May

Week 2 : 1-7 Jun

Week 3 : 8-14 Jun

Week 4 : 15-21 Jun

Week 5 : 22-28 Jun

Week 6 : 29 Jun-5 Jul

Weeks 7-8 : 6 Jul-12 Jul, 13 Jul-19 Jul

Weeks 9-10 : 20 Jul-26 Jul, 27 Jul-2 Aug

Week 11 : 3 Aug-9 Aug

Week 12 : 10 Aug-16 Aug

Navigation

Menu

Special Links

Wiki tools

Wiki tools

Anonymous

Search

User:Vasco.costa/GSoC15/logs

Contents

Community Bonding Period

Background Research

Development Status

Development Phase

Week 1 : 25-31 May

Week 2 : 1-7 Jun

Week 3 : 8-14 Jun

Week 4 : 15-21 Jun

Week 5 : 22-28 Jun

Week 6 : 29 Jun-5 Jul

Weeks 7-8 : 6 Jul-12 Jul, 13 Jul-19 Jul

Weeks 9-10 : 20 Jul-26 Jul, 27 Jul-2 Aug

Week 11 : 3 Aug-9 Aug

Week 12 : 10 Aug-16 Aug

Navigation

Wiki tools

Page tools