5.10 XML Parse Object

GoFiler Legato Script Reference

Legato v 1.6d

Application v 6.1b

Table of Contents	< < Previous	Next >>

Chapter Five — General Functions (continued)

5.10 XML Parse Object

5.10.1 Overview

The Legato API provides a number of SGML style parsing options. The XML Parse is based on the internal XML Mini Parse. This object exists as a lightweight parser the performs very little error checking and does not employ a DTD or schema. As such, it is well suited for fast parsing of XML and HTML.

The parse object is based on the Word Parse Object using the WP_SGML_TAG parse mode, combined with the SGML tag API functions, with glue code and an interface, to provide a simple and effective parser.

Unlike the SGML object, the XML Parse Object does not directly parse CSS property value pairs. That is left to the programmer.

The parser does not require any previous knowledge or setup for concept names, namespaces, or tag and data structure. All of those topics and tasks are left to the developer.

5.10.2 Basic Operation

An object is created using the XMLParseCreate function. Data can be sourced from several common objects such as the Mapped Text, Edit and File Objects, directly from a file, or directly from a string. Depending on how the object is created, no additional parsing is performed on the initial call (except for a direct file which is loaded into memory but not processed).

Basic operation is to load the item then parse item to item using the XMLParseGetItem function. The XMLParseGetItem can operate in raw mode or normal mode. Raw mode allows raw tags to be loaded and examined. However, a number of API functions will not operate since the data has not been ‘cracked’. When operating in normal mode, when a parsed item is detected as a tag, the parsed item is loaded into the element, concept, prefix values and parsed into the internal attribute array. These actions cut up the parse buffer and as such the returned item from XMLParseGetItem will only contain the element as a tag.

As data is parsed, data is classified into four categories: tags, declarations, comments and other. Tags are detected simply by looking at the first character and testing for the open chevron (‘<‘). Once detected, comments and declarations are further refined by looking for ‘<!--’ or simply ‘<!’, respectively. Comments and declaration are always treated as raw items, not cracked, without regard to the raw mode. Other items are non-structured text including character entities, whether named or character positions.

Reported parse positions differ depending on the source of the material. For objects and files, the reported position will be in zero-based X/Y (character and line) positions. For supplied string, the reported positions with be only in zero-based X positions with the Y position always reported as zero. If X/Y positions are desired for a string, open the string with the CreateMappedTextString function and then use the Mapped Text Object handle to create the XML Parse Object.

Unlike the SGML Object, spaces are skipped and accumulated in a space buffer. If the only interest is whether a word space has been encountered prior to item, the XMLParseHasLeadingSpace can be used. If the white space is important to the operation, the XMLParseGetSpace function can be used to retrieve the actual space characters.

Other information such as the position can also be retrieved for each parsed item.

In addition, sections of code can be read suing the the XMLParseLoadContent function. This can be used to load a code segment, HTML or CDATA.

API functions are provided to examine the parsed item. As mentioned, as from overflow considerations, there is little support for error checking. For example, the SGML Parse Object has error processing to report and recover the position when encountering a missing tag closing character (‘>‘) or a closing value quoted.

5.10.3 Setting Up a Parse Operation

5.10.4 Word Parse Functions

Object Control:

XMLParseCreate — Create an XML Parse Object.

XMLParseSetOptions — Sets XML parsing options.

Item Parse:

XMLParseGetItem — Parses the next item and processes tab information as required.

XMLParseGetPosition — Returns the zero-based start and end X/Y position of the last parsed item.

XMLParseGetEndX — Returns the zero-based start ending X position of the last parsed item.

XMLParseGetStartX — Returns the zero-based start X position of the last parsed item.

XMLParseLoadContent — Returns the content of data within tags or a CDATA object.

XMLParseSkippedComments — Tests whether comments were skipped to get to the current item.

Item Properties and Statistics:

XMLParseGetAttributes — Returns a list or table of attributes contained with the last parsed item.

XMLParseGetConcept — Returns the concept name from the last parsed item.

XMLParseGetElement — Returns the element qname from the last parsed item.

XMLParseGetPrefix — Returns the element prefix as part of the qname for the last parsed item.

XMLParseGetSpace — Returns the any prior white space detected prior to the last item.

XMLParseGetSpaceSize — Returns the number of white space characters prior to the parsed item.

XMLParseHasLeadingSpace — Returns a boolean value as to whether there was prior space.

XMLParseIsComment — Tests whether the last item parsed was a comment.

XMLParseIsDeclaration — Tests whether the last item parsed was a declaration.

XMLParseIsSelfContained — Returns a boolean value as to whether the tag is self-contained.

XMLParseIsTag — Returns a boolean as to whether item parser was a tag.

Related Functions:

CloseHandle — Closes an object handle and releases any associated resources.

FileToString — Loads content of a file into a string.

GetEditObject — Gets an Edit Object and associates it with an edit window.

OpenMappedTextFile — Opens the specified file as a Mapped Text Object.

Page revised 2025-08-15

Table of Contents	< < Previous	Next >>

© 2012-2025 Novaworks, LLC. All rights reserved worldwide. Unauthorized use, duplication or transmission is prohibited by law. Portions of the software are protected by US Patents 10,095,672, 10,706,221 and 11,210,456. Novaworks, GoFiler™ and Legato™ are registered trademarks of Novaworks, LLC. EDGAR® is a federally registered trademark of the U.S. Securities and Exchange Commission. Novaworks is not affiliated with or approved by the U.S. Securities and Exchange Commission. All other trademarks are the property of their respective owners. Use of the features specified in this language are subject to terms, conditions and limitations of the Software License Agreement.