Transcribe Geometry Model Data from a PDF report to an ASCII fileBRL-CAD
Status: ClosedTime to complete: 72 hrs Mentors: SeanTags:

We have scans (PDF) of a number of reports documenting early geometric models in the COMGEOM format (a now obsolete format, but the models are interesting nonetheless). These reports contain the actual geometry defining the model as pages and pages of numbers and letters. Unfortunately, the quality is sufficiently poor that optical character recognition (OCR) has a very high rate of error.

This task is to attempt the manual transcription of the MEP-021A Generator Set model described in the report ''A Combinatorial Geometry Computer Description of the MEP-021A Generator Set'' (see the References list below for the link that will let you download the PDF). One possible approach is to use Acrobat Reader or some other PDF reader select and copy the OCR text, paste that to a text file as a starting point, and then manually correct it. There may also be some patterns that will allow for semi-automated processing (for example, if 5 zeros in a row are commonly replaced with the character ''O'' instead of 0, a search and replace is in order.) However you wish to approach it is fine, but remember that the goal is not just the extraction of the OCR text but the production of an accurate transcription of the file. The OCR text can be used as a starting point but it will NOT be accurate.

The goal is to have a file that can be fed to BRL-CAD's comgeom-g importer to generate an accurate .g file. If the generator conversion goes well, there are a significant number of other models (some of them considerably more complex) that we are interested in that can be used to create more tasks of this nature.

References:

  • http://www.dtic.mil/docs/citations/ADA073408

Code:

  • src/conv/comgeom

Please discuss your progress with the developers. This task has the potential for being broken up into multiple tasks depending on the time and accuracy of your conversion.

Uploaded Work
File name/URLFile sizeDate submitted
ADA073408-Table1.csv35.0 KBNovember 22 2013 18:01 UTC
ADA073408-Table2.csv9.9 KBNovember 22 2013 18:02 UTC
ADA073408-Table3.csv6.9 KBNovember 22 2013 18:02 UTC
Comments
Jacob Burroughson November 21 2013 17:58 UTCTask Claimed

I would like to work on this task.

Mandeep Kaur on November 21 2013 17:59 UTCTask Assigned

This task has been assigned to Jacob B. You have 72 hours to complete this task, good luck!

Jacob Burroughson November 22 2013 18:02 UTCReady for review

The work on this task is ready to be reviewed.

Sean on November 23 2013 01:18 UTCno idea

No idea if it's right, but it looks good spot-checking.  Interested in a follow-up task to turn them into input files and attempt an import conversion?

Sean on November 23 2013 01:18 UTCTask Closed

Congratulations, this task has been completed successfully.