REM ***** BASIC ***** REM These OpenOffice.org (OOo) macros will allow you to segment a text to a sentence level instead of paragraphs. REM It is meant to prepare text for translation with the OmegaT CAT if you prefer sentence segmentation. REM REM WHAT'S NEW IN V. 0.3 REM New macro to combine both segmentation rules for both 1 and 2 spaces after punctuation (this is for messy texts mixing REM both styles) REM Better looking code. REM KNOWN PROBLEM: it seems a bug in the search replace feature REM of OpenOffice v. 1.0.1 generates an error. REM Please use with v. 1.1.0 or above. REM REM SETUP: REM a - in OpenOffice.org, Tools menu > Macros > Macro. REM b - select or create a module (default is Module1, just as good as any), and click Edit. REM c - in the BASIC window, go to the end of the file, and paste this whole text. REM d - save and close. You can now use the macros by Tools menu > Macros > Macro, REM select a macro and click Run. REM e - it is possible to add keyboard shortcuts or buttons to run macros easily in OOo. REM REM USAGE: REM 1 - To prepare a file (i.e. segment it into sentences): REM 1a - open it in OpenOffice.org. REM 1b - if sentences in your SOURCE text are separated by dot-1 space, REM run the Segment1Space macro on your source text. REM 1c - if sentences in your SOURCE text are separated by dot-2 spaces, REM run the Segment2Spaces macro on your source text. REM 2 - translate your text and compile it. REM 3 - To finish a file (i.e. restore the original segmentation): REM 3a - open it in OpenOffice.org. REM 3b - if you want sentences in your TARGET text to be separated by dot-1 space, REM run the UnSegment1Space macro on your target text. REM 3c - if you want sentences in your TARGET text to be separated by dot-2 spaces, REM run the UnSegment2Spaces macro on your target text. REM 4 - that's it! REM NOTE: the UnSegment macros might be a bit slow on a big document REM NOTE: OOo's repagination feature can take a while to catch up after running these macros. REM REM Version history: REM 0.1: (2004/03/06) First release - Benjamin Siband REM 0.2: (2004/03/06) Cosmetic changes - Benjamin Siband REM - changed the loop's findFirst into findNext REM - cleaned the code from unnecessary structure declarations REM 0.3: (2004/03/12) Optimization + New function - Benjamin Siband REM - optimization: the common code for the segmentation macros has been extracted into a new procedure REM - new segmentation macro for both 1 and 2 spaces after punctuation (this is for messy texts mixing REM both styles) REM REM Ideas for future development: REM - language versions of the macros (esp. for the list of abbr., but also segmentation rules). REM Please provide special requirements. REM - read the list of abbreviations from an external file REM - auto-installer REM REM ***** DISCLAIMER ***** REM The author has done some basic testing on these macros, and they seem to work well, for him. REM Always keep backups of your files before you run these macros!!! REM The author shall not be held responsible is anything goes wrong with the use of these macros. REM Testing has been done with OpenOffice.org v. 1.1.0, English version, on Windows XP, English US version. REM REM Please email any comments/bugs/suggestions for improvement on the OmegaT @ Yahoo groups mailing list. Const SINGLESPACING = 1 Const DOUBLESPACING = 2 Const BOTHSPACING = 3 Global gAbbr As String Sub Main rem ---------------------------------------------------------------------- rem Mark common abbr. (M., Mr., Mrs., Ms., Dr.) for "no segmentation" rem ---------------------------------------------------------------------- rem Add new abbr. here. Do not forget "\<" for "beginning of word", rem and "\." for the dot. gAbbr = "\