• Solutions
    • FERC XBRL Reporting
    • FDTA Financial Reporting
    • SEC Compliance
    • Windows Clipboard Management
    • Legato Scripting
  • Products
    • GoFiler Suite
    • XBRLworks
    • SEC Exhibit Explorer
    • SEC Extractor
    • Clipboard Scout
    • Legato
  • Education
    • Training
    • SEC and EDGAR Compliance
    • Legato Developers
  • Blog
  • Support
  • Skip to blog entries
  • Skip to archive page
  • Skip to right sidebar

Friday, December 30. 2016

Legato Developers Corner #15: String Testing and Translation

This week we will be covering scanning, testing, and some functions for manipulating textual string data.


String Testing and Translation


This week we will be covering scanning, testing, and some functions for manipulating textual string data.


Introduction


In certain cases, it is necessary to manually scan and test string content. One could manually step through a string character by character and perform specific tests. This would be slow and cumbersome. Fortunately, the Legato SDK contains hundreds of functions to process and manipulate strings. In this article, we will be discussing parsing, basic boolean test functions, word analysis functions, and some conversion functions.


Boolean and Bitwise Results


Functions that test or get data frequent either return a boolean value or a bitwise result. While they are both fundamentally numbers, the boolean data type is used to indicate that if the value is 0, it is FALSE. A non-zero (usually 1) is TRUE. Bitwise return values are typically defined as a dword, a 32-bit unsigned integer. Conventionally the top bit (0x80000000) is not used as part of the result since it normally indicates the value is a formatted error code.


With boolean return values, it is easy to perform operations such as:


if (HasText(mystring)) { ... }


Bitwise return values can contain a lot of information and are normally tested with bit and ordinal masks. This will be covered in detail later. Since working with values like 0x0001000 can be clumsy, the SDK contains definitions for all common bitwise return functions.


A sample use of bitwise return values might look like this:


result = AnalyzeText(mystring);


if ((result & TEXT_TYPE_MASK) == TEXT_TYPE_HEADING) { ... }


A Short Review of the Word Parse Object


Generally speaking, we have to have data to test. The easiest way to look through a string of text is by using the Word Parse Object. The object supports a series of functions that are specifically tailored to “parse” or process textual data. The general-purpose parse runs in three modes: text, tags, and program. The text mode provides word parsing for reading general text. The tags mode is tailored to work on XML, HTML, or SGML tags and character entities. Finally, program mode is made to parse typical program or script text.


We will be focusing on the default parsing mode or text mode (WP_GENERAL), which stops on word spaces, returns (line endings), and punctuation within the textual information.


Basic Operation


The general steps are as follows: create (get handle), load/set data, and iterate until the data has been exhausted. New data can be repeatedly loaded to the same object to process multiple buffers or lines. After completion, the Word Parse Object handle should be closed.


As each item is parsed, the leading spaces and statistics are stored. For example, the caller can check to see if there are leading spaces and even get the white space character as a string. 


Once the source data is set, the source variable can be changed or released. The Word Parse Object makes an internal copy of the data.


Setting Up a Parse Operation


The first action is to create a Word Parse Object and retrieve a handle. That handle is then used in subsequent operations to move through the text and examine each parsed item. For example:


        handle          hWP;
        string          s1, s2;
        int             spaces, count, pos;
        
        s1 = "My favorite pastime is  waiting for my browser to load a page.\rEnd.";

        hWP = WordParseCreate();
        if (hWP == NULL_HANDLE) {
          MessageBox('x', "Error on handle");
          exit;
          }
        WordParseSetData(hWP, s1);
        s2 = WordParseGetWord(hWP);
        while (s2 != "") {
          count++;
          pos = WordParseGetPosition(hWP);
          spaces = WordParseGetSpaceSize(hWP);
          AddMessage("   %3d %3d %3d :%s:", count, pos, spaces, s2);
          s2 = WordParseGetWord(hWP);
          }

        CloseHandle(hWP);

The result in the log:


     1   2   0 :My:
     2  11   1 :favorite:
     3  19   1 :pastime:
     4  22   1 :is:
     5  31   2 :waiting:
     6  35   1 :for:
     7  38   1 :my:
     8  46   1 :browser:
     9  49   1 :to:
    10  54   1 :load:
    11  56   1 :a:
    12  62   1 :page.:
    13  67   1 :End.:

In this case, the parse object is created with the default mode (text). A string is added to the parse object and then each successive word is retrieved along with certain attributes. When added to the log, we surround the returned string value with “::” to illustrate that the string does not contain white space. Note as shown in the log, the first entry has no leading spaces. There is an additional space before “waiting” and a return before “End”.


Functions are provided to retrieve and change the parsing position. In addition, a parse object can be used repeatedly, provided the parse mode remains the same.


Skipping Through a String


Another option while parsing is to skip through a string looking for text or spaces. A series of functions are provided that take a string and a zero-based index as a parameter and then return a new index position. 


  Function Description  
  SkipBackWordSpaces Skips back from a specified index to the first non-word space character.  
  SkipToLineEnding Skips forward to the next line ending character (0x0D/0x0A).  
  SkipToNonText Skips forward to a character that is not alpha-numeric.  
  SkipToWordSpace Skips forward until a word space is found.  
  SkipWordSpaces Skips forward until not on a word space.  

 


Once positions have been established, they can be used with the word parser or functions such as the GetStringSegment function. 


Testing a Word


To avoid having to test each character in a word to determine its type, a large set of SDK functions are provided to test a string, or in some cases a character, as meeting a particular criterion. The following is a partial list of the ‘Is’ or ‘Has’ functions that return TRUE (1) or FALSE (0) depending on the result.


  Function Description  
  HasNumeric Tests a string for any numeric characters (digits).  
  HasText Tests a string for any text (alpha characters).  
  IsAllLower Tests a string for all lower case on any text that is present.  
  IsAllUpper Tests a string for all upper case on any text that is present.  
  IsASCII Tests a string for non-ASCII characters allowing for return and tab characters.  
  IsASCIICharacters Tests a string for non-ASCII (no control characters).  
  IsAccounting Tests a string for accounting characters (i.e., -123 or (34,555.44)).  
  IsAlpha8859 Tests a character or string for ASCII Letters and ISO-8859 Latin letter characters.  
  IsAlphaNumeric8859 Tests a character or string for ASCII letters, numbers, and ISO-8859 Latin letter characters.  
  IsAlpha Tests a character or string for ASCII characters.  
  IsCurrency Tests a character or string as currency group (allows ‘.’ ‘,’ and ‘(’ ‘)’ characters).  
  IsCurrencyFormatted Tests a string as properly formatted currency (US, Euro, Pounds, Yen, cents).  
  IsCurrencyPrefix Tests a string for currency leader (i.e., USD$ or CAN$) allowing for other ISO-4217 codes. Following number can be loosely formatted as accounting.  
  IsCurrencyProper Tests a string for properly structured currency, i.e., €128,333, allowing for multiple commas and periods for US and European formats. It does not check the number of digits.  
  IsDrawing Tests character or string for a limited set of characters commonly used for drawing such as ‘-’ or ‘=’.  
  IsFalse Checks string for common terms such as ‘no’, ‘false’,‘0’ or empty checkboxes as being the same as logical FALSE.  
  IsFootnoteReference Checks string for typical footnote characters, numbers or letters in the hole: ‘(1)’ or ‘(b)’.  
  IsHTML Tests a string as being HTML by checking for certain HTML tags.  
  IsHex Tests a character or string as being valid hex characters. It cannot have the ‘0x’ prefix.  
  IsInString Checks for all or part of string or character inside of a target string (like InString but with a boolean result).  
  IsLeaderBackFill Looks backward in a string for leader fill characters. The string can contain text prior to the leader.  
  IsLeaderFill Looks in string for leader fill characters.  
  IsLower Checks a string (word) for being lower case. The word cannot contain any non-alpha characters.  
  IsNil Checks a character or string as being a financial ‘nil’ value.  
  IsNonBreakingSpace Checks a character or a string as being non-breaking space character(s) (0xA0 or 160).  
  IsNonBreakingSpaceEntity Checks for the start of a string being a non-breaking space (PCDATA) as   or   characters.  
  IsNonBreakingSpacePCDATA Checks for string containing only non-breaking spaces and optional word spaces.  
  IsNumeric Tests a string for strictly numeric (digits).  
  IsPCDATARequired Checks a string for a requirement to encode as PCDATA.  
  IsPercentage Checks a string as a percentage value such as 0% or 00.00% etc.  
  IsReal Checks a string as being a real number.  
  IsRealStrict Checks a string as a real number but with strict requirements.  
  IsRegexMatch Performs a regular expression pattern match on string data.  
  IsRoman Tests a character as a roman numeral or a string as a roman number.  
  IsSectionNumber Tests a string for being a section number (i.e.. 1, 2.2, 2.1.1., etc.).  
  IsStringPadded Tests to see if a string has space padding either before or after.  
  IsTabbedString Checks a string for tab characters.  
  IsText Tests a string for being a textual word with or without conventional punctuation.  
  IsTrue Tests a string for common terms such as ‘yes’, ‘true’, ‘1’ or various checked checkbox styles being the same as logical TRUE.  
  IsUpper Checks a string (word) for being upper case. The word cannot contain any non-alpha characters.  
  IsSGMLCharacterEntity Tests a string for basic SGML character entity structure. It does not check the actual character specification.  
  IsSGMLEmptyElement Tests a string for basic SGML character entity structure. It does not check the actual character specification.  
  IsSGMLTag Tests a string for basic SGML tag structure. Does not check the content other than for <a> or </a> structure.  
  IsValidSGMLAttribute Tests a string for a valid syntax attribute name (with name space).  
  IsValidSGMLElement Tests a string for a valid syntax element name (with name space).  
  IsWildListMatch Matches a list of items against a target string with wildcards. The match string list can be a semicolon separated list of test cases.  
  IsWildMatch Checks two strings for wild card match. Matches string 2 to string 1.  
  IsWildString Tests a string for containing one or more wild card characters.  


Some of the above functions will also accept and test an individual character. Functions that test only characters are also available:


  Function Description  
  IsANSISpace Tests a character as white space including backspace (back tab, 0x08) and tab (0x09).  
  IsAlphaNumeric Tests a character for character ASCII letters or numbers.  
  IsDigit Tests a character as a number (0-9).  
  IsExpressionCharacter Tests a character that could be used in an expression (i.e.., { < +, etc.).  
  IsExpressionGroup Tests a character used in an expression group (i.e., " ' ( or [ ).  
  IsExtendedAlpha Tests a character to see if it is part of ISO-8859 latin alpha set commonly used in English.  
  IsFinancial Checks character as a number, currency or ‘,’ or ‘.’.  
  IsSentenceDelimiter Tests character as one that delimits a sentence (. ! : ?).  
  IsValidSGMLCharacter Tests a character that can be part of an SGML element or attribute.  
  IsValidSGMLStartCharacter Tests a character as a start character in an SGML element or attribute.  
  IsValidVariableCharacter Tests a character that can be used in a programming variable (no lead character exclusion).  
  IsVowel Tests a character as western vowel (A E I O U, upper and lower case).  
  IsWordDelimiter Tests a character as word style delimiter (‘' . , : ; ! ? ( ) [ ] { }’).  
  IsWordSpace Tests a character as a Word Space (space, return, tab, new line, 0x00).  

 


Another powerful tool is the GetWordType function. The GetWordType function analyzes the content of a provided word and returns the type and attributes. The prototype:


dword = GetWordType ( string data );


The data parameter is a string containing a word without leading or trailing spaces. The returned value is a 32-bit dword (or int, unsigned) containing bitwise information. The results can be any of the following:


  Definition   Bitwise   Description
  Item Types        
    WT_TYPE_ITEM_MASK   0x000F0000   Item Type Mask
    WT_TYPE_UNKNOWN   0x00000000   Unknown Value
    WT_TYPE_WORD   0x00010000   Word (dog, cat, monkey)
    WT_TYPE_NUMBER   0x00020000   Number
    WT_TYPE_NUMBER_SERIAL   0x00030000   Serial Number (12, 63)
    WT_TYPE_LEADER   0x00040000   Leader Line
    WT_TYPE_RULER   0x00050000   Ruler (possible or dash, nil)
    WT_TYPE_CURRENCY_LEADER   0x00060000   Opening Currency “$  1,121”
    WT_TYPE_NIL   0x00070000   Nil or Compound Nil “--(a)” or “—” or “$-”
    WT_TYPE_DATE   0x00080000   Date “12/12/12”, “12.12.12”, “23:22” or ISO
  Word Variations        
    WT_WORD_MASK   0x00700000   Word Type Mask
  Types        
    WT_WORD_UNKNOWN   0x00000000   Unknown or General Word Type
    WT_WORD_LOWER   0x00100000   Lower Case Word
    WT_WORD_UPPER   0x00200000   Upper Case Word
    WT_WORD_INITIAL   0x00300000   Initial Capital
  Word Flags        
    WT_WORD_TRAIL_MASK   0x000000FF   Punctuation (low in char)
    WT_WORD_TRAIL_PUNCTUATION   0x00800000   Trails Punctuation (in low char)
    WT_WORD_QUOTED   0x01000000   Word Quoted (can be partial)
    WT_WORD_IN_HOLE   0x02000000   Word has Parenthesis or Brackets
    WT_WORD_LEADER_TRAIL   0x04000000   Word has a Trailing Leader Line
  Lexicon        
    WT_WORD_LEXICON_MASK   0x70000000   Lexicon Mask
    WT_WORD_DATE_MONTH   0x10000000   Word is in Month Lexicon
    WT_WORD_DATE_DAY   0x20000000   Word is in Day Lexicon
    WT_WORD_HONORIFIC   0x30000000   Word is in Honorific Lexicon
  Number Variations        
    WT_NUMBER_ALIGN_MASK   0x000000FF   Alignment Position at Size
  Types        
    WT_NUMBER_MASK   0x00700000   Number Type Mask
    WT_NUMBER_UNKNOWN   0x00000000   Unknown Type
    WT_NUMBER_YEAR   0x00100000   Number is Year (1900-2099)
    WT_NUMBER_DAY   0x00200000   Number is Day (1-31)
    WT_NUMBER_FORMATTED   0x00300000   Number is Formatted
    WT_NUMBER_LIST   0x00400000   Part of a List (1-99 with trail)
  Number Flags        
    WT_NUMBER_NEGATIVE   0x01000000   Negative Number (000) or -000
    WT_NUMBER_IN_HOLE   0x02000000   Negative Number (000)
    WT_NUMBER_FOOTNOTE   0x04000000   Has Footnote
    WT_NUMBER_CURRENCY   0x08000000   Has Currency
    WT_NUMBER_PERCENT   0x10000000   Has Percent
    WT_NUMBER_IN_HOLE_ERROR   0x20000000   Error in Parenthetical
    WT_NUMBER_BAD_FORMAT   0x40000000   Bad Format (characters, not structure)
  Leader Variation        
    WT_LEADER_SIZE_MASK   0x00000FFF   Word Type Mask (character in bottom)
  Ruler Variations        
    WT_RULER_MASK   0x00700000   Drawing Character in the Lower 8-bits
    WT_RULER_CHARACTER   0x000000FF   Mask for Ruler Character
  Ruler Types        
    WT_RULER_MIXED   0x00000000   Of Indeterminate Type
    WT_RULER_SUBTOTAL   0x00100000   Subtotal Type
    WT_RULER_TOTAL   0x00200000   Total Type
  Ruler Flags        
    WT_RULER_DASH   0x01000000   Possible Connecting Dash
  Date Variations        
    WT_DATE_MASK   0x0F000000   Date Code Mask
    WT_DATE_AS_GENERAL   0x00000000   Date as Any Type (short mm/yy not supported)
    WT_DATE_ISO_8601   0x01000000   Date as ISO (in part, w w/o time)
    WT_DATE_TIME_ONLY   0x02000000   A Time with Optional AM/PM
  Unknown Word Data        
    WT_UNKNOWN_ALPHA   0x0000000F   Alpha Count
    WT_UNKNOWN_NUMERIC   0x000000F0   Numeric Count
    WT_UNKNOWN_CURRENCY   0x00000300   Currency Count (4)
    WT_UNKNOWN_PUNCTUATION   0x00000C00   Sentence Punctuation Count (4)
    WT_UNKNOWN_COMMA_PERIOD   0x00003000   Comma Period Count
    WT_UNKNOWN_GROUP   0x0000C000   Parenthesis/Brace Group
    WT_UNKNOWN_QUOTE   0x00300000   Quote Character Count
    WT_UNKNOWN_FOOTNOTE   0x00C00000   Footnote Type Characters
    WT_UNKNOWN_RULE   0x03000000   Rule Character Count
    WT_UNKNOWN_ELLIPSE   0x0C000000   Ellipse Count
    WT_UNKNOWN_OTHER   0x30000000   Other Count

 


Depending on your programming background, bitwise operation may be a bit foreign. They are widely used under the hood in many environments and can be very efficient at conveying a lot of information in a small form factor. Generally, the binary information is segmented into flags and ordinals. Flags are simple. If a bit is set, then the condition is true. Ordinals, on the other hand, require a mask to filter the group associated bits. Those bits in turn represent one of a set of conditions. For example, the resulting dword can be filtered by ‘ANDing’ the result with the WT_TYPE_ITEM_MASK value:


        code = GetWordType(word);
        switch (code & WT_TYPE_ITEM_MASK) {
          case WT_TYPE_UNKNOWN:
            break;
          case WT_TYPE_WORD:
            break;
          case WT_TYPE_NUMBER:
            break;
          case WT_TYPE_NUMBER_SERIAL:
            break;
          case WT_TYPE_LEADER:
            break;
          case WT_TYPE_RULER:
            break;
          case WT_TYPE_CURRENCY_LEADER:
            break;
          case WT_TYPE_NIL:
            break;
          case WT_TYPE_DATE:
            break;
          }

Each case section can then count or act upon the details of the item. For example, if the type is date, then the WT_DATE_ items can be tested to narrow the type of date.


The GetWordType function is useful for aggregating information from a text stream to perform high level analysis. For example, a line of text can be parsed, information accumulated, and the first and last word data examined to determine the probability of line being a heading, part of a paragraph, or perhaps a row of a table.


Analysis is performed on a gross level basis. That is, types of characters are counted and then run through logic to perform a basic analysis. For example, if one or two dashes are present without text, the content will be considered a “nil” value as would be seen in a table. On other hand, three dashes would be considered as a possible rule or visual aid.


In addition, there are the related functions GetListType and GetNumericType, which are similar in operation to GetWordType but return data specific to values as list and numbers, respectively.


The words to test should be passed to the function without spaces. If the Word Parse Object is employed with WP_GENERAL mode, the data returned is compatible with analysis.


Converting Common String Forms


The Legato SDK also contains a number of functions for performing common string conversions and operations:



  Function Description  
  ChangeCase Changes the case of a string, including HTML.  
  CharacterToLowerCase Converts a character to lower case (ANSI only).  
  CharacterToUpperCase Converts a character to upper case (ANSI only).  
  ConformAddressString Conforms the case and style of an address line.  
  ConvertAddNewlines Copies string and adds newline (0x0A) characters to return (0x0D) characters.  
  ConvertDeleteNewlines Copies string while deleting newline (0x0A) characters.  
  ConvertFromEscapeCharacters Copies from escaped characters (with backslash such as \r or \n).  
  ConvertFromUnderbars Converts underbars in a string to spaces.  
  ConvertFromUnderlines Removes the static control underline characters.  
  ConvertNoCodes Converts a string and changes all control codes (including newlines, returns, tabs) to period (‘.’) characters.  
  ConvertNoPunctuation Converts a string by removing any punctuation.  
  ConvertNoSpaces Converts a string by removing all space characters (0x20).  
  ConvertSoftBreaksToSpaces Converts soft break characters (0x09, 0x0D, 0x0A) to spaces.  
  ConvertToEscapeCharacters Copies to escaped characters (with backslash such as \r \n)  
  ConvertToUnderbars Copies with spaces changed to underbars.  
  ConvertToUnderlines Copies to static control underline characters using escaped ‘&’.  
  ConvertWordSpaces Converts all word spaces to single spaces.  
  MakeLowerCase Makes a string lower case (ANSI only).  
  MakeUpperCase Makes a string upper case (ANSI only).  
  PadString Pads a string to a specified size with an optional fill string.  
  ReplaceInString Replaces matching strings inside another string with or without case sensitivity.  
  ReplaceInStringRegex Replaces matching strings inside another string using regular expression rules.  
  ReverseString Reverses the character position content of a string.  
  TrailStringAfter Trails off a string with an ellipse (‘...’ characters) if exceeds specified size.  
  TrailStringAfterAlways Trails off a string with an ellipse (‘...’ characters) at specified size or always.  
  TrailStringBefore Truncates a string and adds ellipse (‘...’ characters) at the start of the string if the length exceeds the specified size.  
  TrimNonBreakingSpaces Trims non-breaking spaces (as raw characters).  
  TrimPadding Trims the padding on both left and right sides of string.  
  TrimString Trims the trailing spaces from the right side (end) of a string.  

 


Changing case is a common operation, which can be performed using the MakeLowerCase and MakeUpperCase functions. The ChangeCase function is substantially more sophisticated allows for the processing of sentences of data in a number of modes, such as title capitalization.


Expanding Our Example


Let us add a few things to the above example:


        handle          hWP;
        string          s1, s2, s3;
        dword           type;
        int             spaces, count, pos;
        
        s1  = "On July 27, 2016, the Company: (i) purchased a dog; (ii) found a vet; ";
        s1 += "(iii) purchased a dog bed; and, (iv) spent $110 on a doggy ID chip. ";

        hWP = WordParseCreate();
        if (hWP == NULL_HANDLE) {
          MessageBox('x', "Error on handle");
          exit;
          }
        WordParseSetData(hWP, s1);
        s2 = WordParseGetWord(hWP);
        while (s2 != "") {
          count++;
          pos = WordParseGetPosition(hWP);
          spaces = WordParseGetSpaceSize(hWP);
          type = GetWordType(s2);
          s2 = ":" + s2 + ":";
          s2 = PadString(s2, 12);
          switch (type & WT_TYPE_ITEM_MASK) { 
            case WT_TYPE_UNKNOWN: s3 = "Unknown"; break; 
            case WT_TYPE_WORD: s3 = "Word"; break; 
            case WT_TYPE_NUMBER: s3 = "Number"; break; 
            case WT_TYPE_NUMBER_SERIAL: s3 = "Number (Serial)"; break; 
            case WT_TYPE_LEADER: s3 = "Leader"; break; 
            case WT_TYPE_RULER: s3 = "Ruler"; break; 
            case WT_TYPE_CURRENCY_LEADER: s3 = "Currency"; break; 
            case WT_TYPE_NIL: s3 = "Nil"; break; 
            case WT_TYPE_DATE: s3 = "Date"; break; 
            default: s3 = "";
            }
          AddMessage("   %3d %3d %3d 0x%08X %s %s", count, pos, spaces, type, s2, s3);
          s2 = WordParseGetWord(hWP);
          }

        CloseHandle(hWP);

The result in the log would appear as:


     1   2   0 0x00310000 :On:         Word
     2   7   1 0x10310000 :July:       Word
     3  11   1 0x00230002 :27,:        Number (serial)
     4  17   1 0x00030005 :2016,:      Number (serial)
     5  21   1 0x00110000 :the:        Word
     6  30   1 0x00B1003A :Company::   Word
     7  34   1 0x02110000 :(i):        Word
     8  44   1 0x00110000 :purchased:  Word
     9  46   1 0x00110000 :a:          Word
    10  51   1 0x0011003B :dog;:       Word
    11  56   1 0x02110000 :(ii):       Word
    12  62   1 0x00110000 :found:      Word
    13  64   1 0x00110000 :a:          Word
    14  69   1 0x0011003B :vet;:       Word
    15  75   1 0x02110000 :(iii):      Word
    16  85   1 0x00110000 :purchased:  Word
    17  87   1 0x00110000 :a:          Word
    18  91   1 0x00110000 :dog:        Word
    19  96   1 0x0011003B :bed;:       Word
    20 101   1 0x0011002C :and,:       Word
    21 106   1 0x02110000 :(iv):       Word
    22 112   1 0x00110000 :spent:      Word
    23 117   1 0x08020004 :$110:       Number
    24 120   1 0x00110000 :on:         Word
    25 122   1 0x00110000 :a:          Word
    26 128   1 0x00110000 :doggy:      Word
    27 131   1 0x00210000 :ID:         Word
    28 137   1 0x0091002E :chip.:      Word

We are using the PadString function to make a fixed size field in the log for the word, and we are still maintaining the ‘::’ convention that contains the word. The return value from the GetWordType function is both translated to a friendly string and printed in hexadecimal form in the log. Note that the words “27,” and “2016,” are considered serial numbers, as in a list that could appear within narrative as opposed to a table cell.


Conclusion


Since Legato is a part of GoFiler and GoFiler specializes in converting and editing text, many of the foundational string functions are exposed as script functions. If you cannot find a function to match your particular operation, contact technical support as it may already exist.




Scott Theis is the President of Novaworks and has been involved in the EDGAR industry for over thirty years. He has worked with the EDGAR system at multiple levels: as a financial printer, a member of the EDGAR design team, and as a software developer. He has extensive expertise with EDGAR, HTML, XBRL, and other programming languages.


Additional Resources

EDGAR 2016 Filing Peak Schedule (www.sec.gov)

Posted by
Scott Theis
in Development at 21:04
Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)
No comments
The author does not allow comments to this entry

Quicksearch

Categories

  • XML Accounting
  • XML AICPA News
  • XML FASB News
  • XML GASB News
  • XML IASB News
  • XML Development
  • XML Events
  • XML FERC
  • XML eForms News
  • XML FERC Filing Help
  • XML Filing Technology
  • XML Information Technology
  • XML Investor Education
  • XML MSRB
  • XML EMMA News
  • XML FDTA
  • XML MSRB Filing Help
  • XML Novaworks News
  • XML GoFiler Online Updates
  • XML GoFiler Updates
  • XML XBRLworks Updates
  • XML SEC
  • XML Corporation Finance
  • XML DERA
  • XML EDGAR News
  • XML Investment Management
  • XML SEC Filing Help
  • XML XBRL
  • XML Data Quality Committee
  • XML GRIP Taxonomy
  • XML IFRS Taxonomy
  • XML US GAAP Taxonomy

Calendar

Back May '25 Forward
Mo Tu We Th Fr Sa Su
Sunday, May 18. 2025
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Feeds

  • XML
Sign Up Now
Get SEC news articles and blog posts delivered monthly to your inbox!
Based on the s9y Bulletproof template framework

Compliance

  • FERC
  • EDGAR
  • EMMA

Software

  • GoFiler Suite
  • SEC Exhibit Explorer
  • SEC Extractor
  • XBRLworks
  • Legato Scripting

Company

  • About Novaworks
  • News
  • Site Map
  • Support

Follow Us:

  • LinkedIn
  • YouTube
  • RSS
  • Newsletter
  • © 2024 Novaworks, LLC
  • Privacy
  • Terms of Use
  • Trademarks and Patents
  • Contact Us