Name

remrt — network distributed (remote) ray-trace dispatcher

Synopsis

remrt [options...] model.g objects...

DESCRIPTION

remrt is intended to be an entirely plug-compatible replacement for the rt(1) program. While rt can operate on more than one processor of a shared memory MIMD style multi-processor, it can not spread the work out across the network. remrt and its companion program rtsrv(1) implement a libpkg(3)-based protocol for distributing the task of ray-traced rendering out to an arbitrary number of processors on the network. Those processors can be from an arbitrary mix of vendors, with differing instruction sets and data representations, operating at widely different speeds.

remrt provides two levels of fault-tolerance in its distributed computation. First, any rtsrv processor which fails (or whose network link fails) will have its work reassigned automatically by remrt to other processors, so that the failure of that processor does not hold up the progress of the computation. On a regular interval remrt attempts to restart computation on failed processors marked "active", so that when failed systems become available again they are put back to work. Second, remrt offers a form of "checkpoint-restart" by carefully writing all received pixel data immediately to disk. If the machine running remrt should fail, when remrt is restarted, it will determine what work remains to be done and restart all the rtsrv processes on the remote machines.

remrt takes exactly the same set of command line arguments as rt(1). For a full discussion of those options, consult the rt(1) manual page. From within mged(1) the remrt program can be run from the current mged view by giving the mged command



    rrt remrt -M -s###

  

where "###" is the image resolution desired.

remrt depends on the Berkeley rsh (1) remote shell command and the user's corresponding ".rhosts" file to automatically initiate remote computation.

remrt requires a control file called ".remrtrc" which specifies a few essential parameters of the server machines (network hosts) that are to participate in the distributed computation. The ".remrtrc" file contains a list of normal remrt commands which are to be executed before distributed processing is to begin. This can be useful for establishing a variety of user-specific defaults. However, the most important command to provide is a "host" command to describe each of the remote hosts that will (or may) participate in the distributed computation. Here is a sample ".remrt" file:



    host wolf passive cd /tmp/cad/.remrt.4d

    host voyage always cd /tmp/cad/.remrt.4d

    host whiz hacknight convert /tmp

  

The first argument to the "host" command is the name of the server machine. It will be converted to a fully qualified name by calling gethostbyname(3) and gethostbyaddr(3).

The second argument to the "host" command describes "when" this machine should be used.

always

says that whenever this host is not participating in the computation, repeated attempts should be made (every 5 minutes) to start an rtsrv process on that machine. This option is most useful for machines where the system administrator and/or owner of the machine won't mind your using the machine at any time. (Note that rtsrv runs at the lowest priority by default, so this isn't too terribly uncivilized. But be polite and ask first.)

night

indicates that the machine should only be used during the night and on weekends. Night extends from 1800 through 0000 to 0800. Saturday and Sunday are considered to be in night time for the entire day.

hacknight

indicates that the machine is owned by a late-night "hacker", and can be used while that hacker is likely to be asleep. Hacking tends to run from 1300 through 0000 to 0400, so hacknight is the time period from 0400 to 1300.

passive

indicates that remrt should never attempt to initiate an rtsrv process on this machine, but if the user should happen to start rtsrv manually on the machine and direct it to join the computational fray, then the necessary information will be available.

If an unlisted machine should join the fray, it will be added

with a "when" value of passive and a "where" value of convert. When an rtsrv passes from night into day, it is automatically terminated by remrt at the appointed time.

The third argument to the "host" command is the "where" parameter, which indicates where the BRL-CAD ".g" file (and any related texture maps) can be found.

cd

indicates that the rtsrv program should change directory cd(2) to the indicated directory path, and should just read the data files in that directory as rt normally would. This can be used to specify NFS or RFS mounted remote directories, or static copies of the binary database file(s) needed to perform the ray-tracing. This is the most efficient way of operating rtsrv, but it also requires some manual preparation on the part of the user to install all the required files in the designated location.

convert

indicates that remrt should send across an ASCII machine-independent version of the ".g" database file using the command:



  asc2g < file.g | rsh host 'cd directory; g2asc > file.g'

before starting up the rtsrv program in that same directory. This relieves the user of the burden of setting up the ".g" database file, but suffers from several drawbacks. First, the transmission of a large database can take a noticeable amount of time. Second, should the server host go down and then re-join the fray, the database will be sent again, because remrt has no easy way to tell if the previous ".g" file is still intact after the crash/restart. Third, remrt has no way to tell what auxiliary files might be needed for this ".g" file, and thus can not send them automatically. If the ".g" file references height field, extruded bit-map, or volumetric solids, the associated data files will not be present on a convert mode server. The same applies for texture map and bump map files.

remrt uses several different strategies for optimizing the dispatching of work. These can be controlled by the "allocate" command. If "allocate frame" has been specified, then the work allocation method is a "free for all", allocating work from one frame at a time to all servers as they become ready for more. This maximizes the CPU overhead for prepping (because all CPUs will prep all frames), but it also provides the shortest wall-clock time to getting the first frame finished. This mode is recommended for demonstrations, and other situations where people are sitting around waiting for results to appear on the screen.

If the "allocate load" command has been given, then new servers will not be assigned to a given frame unless there is at least enough work remaining on that frame to require 10 percent of that server's measured performance capacity. Otherwise the server is started on a subsequent frame.

If the "allocate movie" command is given, then each server is allocated a whole frame to do. This minimizes the CPU time spent in the overhead of prepping the frame, and tends to maximize overall throughput, at the price of making you wait a long time for the first frame to finish. This mode is best used when crunching out large numbers of frames for an animation. (See also the scriptsort(1) command for a clever power-of-two script rearrangement technique first proposed by Jim Blinn).

The output can be stored either in a file, or sent to the current framebuffer, the same as with rt(1).

When remrt is run without command line arguments, it enters an interactive mode. In this mode, the current framebuffer can be assigned and released, frames can be added to the list of pending work, and many other control and status commands can be used.

Normally remrt runs in a batch mode, taking all its information from the command line, the script file on stdin, and the ".remrtrc" file. If you wish to issue an interactive-style command to remrt while it is running in batch mode, this can be accomplished by giving an extra argument to rtsrv, which will simply pass on the command and exit.

For example:



    rtsrv server_host 4446 status

  

would send the command "status" to the remrt process running on the machine called "server_host" and listening at port 4446. 4446 is the port used by the first copy of remrt running on a machine. If a second copy of remrt is started while the first one continues to run, it will be assigned port 4447. One is added to the port number repeatedly until an available port is found. Normally you do not need to worry about which port is being used, unless you wish to send commands there directly. The netstat(1) command can sometimes be useful to track down which ports are being used.

Example 1. Performing a Basic remrt/rtsrv Raytrace

The following steps will set up a local (single machine) example remrt session. This does not utilize the full power of the remrt system, but will illustrate how the individual pieces relate to each other. Rather than use rsh, each process will be launched manually.

Launch an fbserv instance on the desired port and with the desired type (we will use port 0 and /dev/X for a Linux-based X11 rendering - windows users can try /dev/wgl.) For a somewhat larger render we specify a size of 2048 pixels square.

fbserv -s2048 -p0 -F/dev/X

If an graphical framebuffer was specified, a window should appear. Otherwise, fbserv will silently wait for input.

In MGED, set up the scene you wish to render and then save a script with saveview

saveview remrt.rt

Edit the script and replace the rt command with remrt. Replace the output file specification (-o filename) with a -F0 option to point the render to the fbserv started in the previous step. Also add a -s2048 option to specify a rendering size to match that provided to the fbserv.

Run the script to launch remrt. You should see output indicating the program has launched. (Note: the port number printed by the default output is not useful - you want to use 4446 by default to connect to remrt.)

./remrt.rt



08/22 21:51:41 machinename BRL-CAD Release 7.30.0  Network-Distributed RT (REMRT)

    Sat, 22 Aug 2020 11:24:22 -0400, Compilation 1

    user@machinename



pkg_permserver(rtsrv, 8): unknown service

08/22 21:51:41 Automatic REMRT on machine

08/23 21:51:41 Assigned LIBPKG permport 24081

08/23 21:51:41 Listening at TCP port 4446

08/23 21:51:41 Reading script on stdin

08/22 21:51:41 Starting to scan animation script

08/22 21:51:41 Animation script loaded

08/22 21:51:41 Worker assignment interval=5 seconds:

   Server   Last  Last   Average  Cur   Machine

    State   Lump Elapsed pix/sec Frame   Name

  -------- ----- ------- ------- ----- -------------

08/22 21:51:41 Seeking servers to start

    

Note that it is seeking servers - we have not given remrt any instructions on how to start its own. To start a server and see the system work, we set up a local directory containing the same .g file we used to set up the scene, and from that directory run rtsrv. (For these purposes we also add the -d so we can see more of what is happening. Without this option, if anything isn't set up correctly rtsrv can fail with cryptic errors.)

rtsrv -d machinename 4446

If successful, both remrt and rtsrv will being producing output. The rtsrv will be more verbose, and should look like the following:



PIXELS fr=0 pix=3588096..3592191, rays=3712, cpu=0.153336

ph_enqueue: 3592192 3596287 0

PIXELS fr=0 pix=3592192..3596287, rays=3856, cpu=0.143813

ph_enqueue: 3596288 3600383 0

PIXELS fr=0 pix=3596288..3600383, rays=3456, cpu=0.145104

ph_enqueue: 3600384 3604479 0

PIXELS fr=0 pix=3600384..3604479, rays=4032, cpu=0.145221

ph_enqueue: 3604480 3608575 0

PIXELS fr=0 pix=3604480..3608575, rays=3776, cpu=0.141353

ph_enqueue: 3608576 3612671 0

PIXELS fr=0 pix=3608576..3612671, rays=3696, cpu=0.13579

ph_enqueue: 3612672 3616767 0

    

Upon completion, remrt will notify the user and exit. For this examle, the full remrt output is:



08/22 21:51:41 machinename BRL-CAD Release 7.31.0  Network-Distributed RT (REMRT)

    Sat, 22 Aug 2020 11:24:22 -0400, Compilation 1

    user@machinename



pkg_permserver(rtsrv, 8): unknown service

08/22 21:51:41 Automatic REMRT on machinename

08/22 21:51:41 Listening at port 24081, reading script on stdin

08/22 21:51:41 Starting to scan animation script

08/22 21:51:41 Animation script loaded

08/22 21:51:41 Worker assignment interval=5 seconds:

   Server   Last  Last   Average  Cur   Machine

    State   Lump Elapsed pix/sec Frame   Name 

  -------- ----- ------- ------- ----- -------------

08/22 21:51:41 Seeking servers to start

host_lookup_by_hostent(localhost) got localhost?

08/22 21:54:32 127.0.0.1: using 12 of 12 cpus

08/22 21:54:32 127.0.0.1 dirbuild OK (Output from STEP converter step-g.)

08/22 21:54:32 127.0.0.1: process_cmd 'opt -w2048 -n2048 -H0 -p0 -U0 -J0 -A0.4 -l0 -E1.41421 -x0 -X0 -T5.000000e-04/0.000000e+00'

08/22 21:54:32 127.0.0.1: Using tolerance 0

08/22 21:54:32 127.0.0.1: process_cmd 'viewsize 9.86571722597487e+02'

08/22 21:54:32 127.0.0.1: process_cmd 'orientation 2.48097349045873e-01 4.76590573266048e-01 7.48097349045873e-01 3.89434830518390e-01'

08/22 21:54:32 127.0.0.1: process_cmd 'eye_pt 1.01790192500394e+04 6.80791388110900e+03 5.76220366705708e+03'

08/22 21:54:32 127.0.0.1: process_cmd 'clean'

08/22 21:54:33 127.0.0.1: BRL-CAD Release 7.31.0  The BRL-CAD Optical Shader Library

    Sat, 22 Aug 2020 11:24:22 -0400, Compilation 1

    user@machinename

08/22 21:54:33 127.0.0.1: PREP: cpu = 9.7e-05 sec, elapsed = 0.000109 sec

    parent: 0.0user 0.0sys 0:00real 0% i+d maxrss +pf csw

  children: 0.0user 0.0sys 0:00real 0% i+d maxrss +pf csw

08/22 21:54:33 127.0.0.1: NUBSP: 0 cut, 1 box (1 empty)

08/22 21:54:33 127.0.0.1 gettrees OK (Document)

08/22 21:54:34 127.0.0.1: shade_inputs(Brep_1.s) flip N xy=1292, 238 ID_BREP surf=125 dot=0.422618

08/22 21:54:34 127.0.0.1: center: 10236.048570366667 7005.4508761769466 5419.0434225048521

08/22 21:54:34 127.0.0.1: dir: -0.74240387650610384 -0.51983679072568423 -0.42261826174069994

08/22 21:54:35 127.0.0.1: shade_inputs(Brep_1.s) flip N xy=1090, 331 ID_BREP surf=37 dot=0.422618

08/22 21:54:35 127.0.0.1: center: 10276.352963334217 6914.8807626523731 5459.6463522037802

08/22 21:54:35 127.0.0.1: dir: -0.74240387650610384 -0.51983679072568423 -0.42261826174069994

08/22 21:55:58 127.0.0.1: shade_inputs(Brep_1.s) flip N xy=1069, 1335 ID_BREP surf=171 dot=0.422618

08/22 21:55:58 127.0.0.1: center: 10114.720787667989 6789.3550779060624 5897.983356695433

08/22 21:55:58 127.0.0.1: dir: -0.74240387650610384 -0.51983679072568423 -0.42261826174069994

08/22 21:56:16 Frame 0 DONE: 103.085 elapsed sec, 3890128 rays/1189.42 cpu sec

08/22 21:56:16 RTFM=37737.1 rays/sec (3270.61 rays/cpu sec)

08/22 21:56:16 All work done!

08/22 21:56:16 Task accomplished

    

The framebuffer started in the beginning will now hold the image results. To save those results to a file, the fb-png is used:

fb-png -F0 -s2048 image.png



SEE ALSO

rtsrv(1), rt(1), scriptsort(1), brlcad(1), mged(1), lgt(1), pix-fb(1), rtray(1), rtpp(1), librt(3), ray(5V), pix(5)

DIAGNOSTICS

Numerous error conditions are possible. Descriptive messages are printed on standard error.

SEE ALSO

M. Muuss, "Workstations, Networking, Distributed Graphics, and Parallel Processing", in "Computer Graphics Techniques: Theory and Practice", ed: Rogers & Earnshaw, Springer Verlag, New York, pages 409-472

BUGS

Most deficiencies observed while using the remrt program are usually with the rt(1) program, with which it shares a substantial amount of common code, or with the librt(3) library. If a frame fails to render properly, try processing it on a single machine using rt(1) to determine if the problem is in the ray-tracing side of things, or the distributed computation side of things.

AUTHOR

BRL-CAD Team

COPYRIGHT

This software is Copyright (c) 1994-2020 by the United States Government as represented by U.S. Army Research Laboratory.

BUG REPORTS

Reports of bugs or problems should be submitted via electronic mail to