![]() ![]() |
GoFiler Legato Script Reference
Legato v 1.6d Application v 6.1b
|
Table of Contents | < < Previous | Next >> |
Chapter Five — General Functions (continued)
The Legato API provides a number of SGML style parsing options. The XML Parse is based on the internal XML Mini Parse. This object exists as a lightweight parser the performs very little error checking and does not employ a DTD or schema. As such, it is well suited for fast parsing of XML and HTML.
The parse object is based on the Word Parse Object using the WP_SGML_TAG parse mode, combined with the SGML tag API functions, with glue code and an interface, to provide a simple and effective parser.
Unlike the SGML object, the XML Parse Object does not directly parse CSS property value pairs. That is left to the programmer.
The parser does not require any previous knowledge or setup for concept names, namespaces, or tag and data structure. All of those topics and tasks are left to the developer.
An object is created using the XMLParseCreate function. Data can be sourced from several common objects such as the Mapped Text, Edit and File Objects, directly from a file, or directly from a string. Depending on how the object is created, no additional parsing is performed on the initial call (except for a direct file which is loaded into memory but not processed).
Basic operation is to load the item then parse item to item using the XMLParseGetItem function. The XMLParseGetItem can operate in raw mode or normal mode. Raw mode allows raw tags to be loaded and examined. However, a number of API functions will not operate since the data has not been ‘cracked’. When operating in normal mode, when a parsed item is detected as a tag, the parsed item is loaded into the element, concept, prefix values and parsed into the internal attribute array. These actions cut up the parse buffer and as such the returned item from XMLParseGetItem will only contain the element as a tag.
As data is parsed, data is classified into four categories: tags, declarations, comments and other. Tags are detected simply by looking at the first character and testing for the open chevron (‘<‘). Once detected, comments and declarations are further refined by looking for ‘<!--’ or simply ‘<!’, respectively. Comments and declaration are always treated as raw items, not cracked, without regard to the raw mode. Other items are non-structured text including character entities, whether named or character positions.
Reported parse positions differ depending on the source of the material. For objects and files, the reported position will be in zero-based X/Y (character and line) positions. For supplied string, the reported positions with be only in zero-based X positions with the Y position always reported as zero. If X/Y positions are desired for a string, open the string with the CreateMappedTextString function and then use the Mapped Text Object handle to create the XML Parse Object.
Unlike the SGML Object, spaces are skipped and accumulated in a space buffer. If the only interest is whether a word space has been encountered prior to item, the XMLParseHasLeadingSpace can be used. If the white space is important to the operation, the XMLParseGetSpace function can be used to retrieve the actual space characters.
Other information such as the position can also be retrieved for each parsed item.
In addition, sections of code can be read suing the the XMLParseLoadContent function. This can be used to load a code segment, HTML or CDATA.
API functions are provided to examine the parsed item. As mentioned, as from overflow considerations, there is little support for error checking. For example, the SGML Parse Object has error processing to report and recover the position when encountering a missing tag closing character (‘>‘) or a closing value quoted.
Object Control:
Item Parse:
Item Properties and Statistics:
Related Functions:
Page revised 2025-08-15
Table of Contents | < < Previous | Next >> |
© 2012-2025 Novaworks, LLC. All rights reserved worldwide. Unauthorized use, duplication or transmission is prohibited by law. Portions of the software are protected by US Patents 10,095,672, 10,706,221 and 11,210,456. Novaworks, GoFiler™ and Legato™ are registered trademarks of Novaworks, LLC. EDGAR® is a federally registered trademark of the U.S. Securities and Exchange Commission. Novaworks is not affiliated with or approved by the U.S. Securities and Exchange Commission. All other trademarks are the property of their respective owners. Use of the features specified in this language are subject to terms, conditions and limitations of the Software License Agreement.