• Solutions
    • FERC XBRL Reporting
    • FDTA Financial Reporting
    • SEC Compliance
    • Windows Clipboard Management
    • Legato Scripting
  • Products
    • GoFiler Suite
    • XBRLworks
    • SEC Exhibit Explorer
    • SEC Extractor
    • Clipboard Scout
    • Legato
  • Education
    • Training
    • SEC and EDGAR Compliance
    • Legato Developers
  • Blog
  • Support
  • Skip to blog entries
  • Skip to archive page
  • Skip to right sidebar

Friday, October 11. 2019

LDC #156: Introduction to File Types in Legato

A very common scenario for a script is to get a file from a user, perform several actions on it and then save the file. As developers it is easy to fall into the “user is always correct” trap. We assume that if we are asking for an HTML file, the user is going to give us one. Whether the user does it on purpose or not, sometimes this isn’t the case and we, as developers, should be prepared for this use case. This blog is going to discuss how we can validate files we receive from the user.


Since we are using a Windows file system most of the user’s files will have extensions. This is a good starting point for file type validation. Generally, unless there is a lack of computer skills (or malicious intent) a file’s extension matches its content. To get the extension simply use Legato’s GetExtension function.


string = GetExtension ( string name );


This function returns the extension of the file, with the leading period. Just adding this quick check can reduce the chance of bad data causing unexpected data in your script. Consider a script that takes an HTML file and a CSV file of changes to make to the HTML file. The last thing a developer needs is an upset user because they accidentally selected a XLSX document and it caused the script to screw up their HTML file with bogus edits. Checking the file extension can prevent these kinds of mistakes. However, this doesn’t always help. Certain extensions like XML can have drastically different file contents. An XBRL instance document is nothing like a 13F information table but both are XML format and XML extension.


So what can we do about this? Luckily, with Legato you can have the application test the contents of a file and determine its file type. This test includes narrowing down to specific types of XML if the application understands the XML coding. To do this we have two options:


dword = GetFileTypeCode ( string source, [boolean extensiononly] );

string = GetFileTypeString ( string source, [boolean extensiononly] );


These functions are essentially the same except GetFileTypeCode returns a numerical value to represent the resulting file type while the GetFileTypeString returns a string representing the type. Both functions will open the file and read a small amount of data to determine information about the file. For example, when called on an XML file the function will look at the namespaces of the XML file to determine what kind of XML it is. The extensiononly option skips the file content analysis. This can still be useful since it will deal with file formats that have multiple extensions like HTM or HTML. When using the GetFileTypeCode function there are defines for the different file types. These can be found in the legato SDK header file (Appendix A of the Legato documentation) but here are some of the common ones:


FT_ANSIANSI Format (CB)
FT_OEMOEM Format (CB)
FT_UNICODEUnicode Text (CB)
FT_ASCIIASCII Text 7-bit
FT_TEXTText Format (Coding Unknown)
FT_HTMLHTML Native (CB/File Type)
FT_RTFRich Text Format (CB)
FT_CSSCascading Style Sheet
FT_LOGLog File (Text)
FT_WORDMicrosoft Word
FT_POWERPOINTMicrosoft PowerPoint
FT_PDFPortable Document Format
FT_WORDPERFECTWordPerfect
FT_PAGEMAKERAdobe PageMaker
FT_INDBAdobe InDesign Book (INDB)
FT_INDDAdobe InDesign Document (INDD)
FT_IDMLAdobe InDesign XML (IDML)
FT_SEC_MESSAGESEC Acceptance/Suspense Message
FT_CSVCSV (CB)
FT_XMLXML (non-specific)
FT_XSDXML Style Data (non-specific)
FT_RSSReally Simple Syndication XML Data
FT_EXCELMicrosoft Excel
FT_IXBRLInline XBRL File (XHTML)
FT_XBRLXBRL File Group Member
FT_XBRL_INSInstance (main)
FT_XBRL_SCHSchema
FT_XBRL_CALCalculation
FT_XBRL_DEFDefinition
FT_XBRL_LABLabel
FT_XBRL_PREPresentation
FT_XBRL_REFReference
FT_XFRXBRL Financial Report (PSG, XDS)
FT_XFDLXFDL (EDGAR and Sec16 Filing)
FT_XML_SECTION_16Section 16 XML (EDGAR)
FT_XML_FORM_13FForm 13F XML (EDGAR)
FT_XML_FORM_13F_TABForm 13F Table XML (EDGAR)
FT_XML_FORM_13HForm 13H XML (EDGAR)
FT_XML_FORM_17AForm X-17A-5 XML (EDGAR)
FT_XML_FORM_17HForm 17H XML (EDGAR)
FT_XML_FORM_CForm C XML (EDGAR)
FT_XML_FORM_CFPForm CFPORTAL XML (EDGAR)
FT_XML_FORM_DForm D XML (EDGAR)
FT_XML_FORM_MAForm MA XML (EDGAR)
FT_XML_FORM_N_CENForm N-CEN XML (EDGAR)
FT_XML_FORM_N_MFPForm N-MFP XML (EDGAR)
FT_XML_FORM_N_MFP1Form N-MFP1 XML (EDGAR)
FT_XML_FORM_N_PORTForm N-PORT XML (EDGAR)
FT_XML_FORM_N_SARForm N-SAR XML (EDGAR)
FT_XML_FORM_SDRForm SDR XML (EDGAR)
FT_XML_FORM_SDR_EXHIBITForm SDR XML (EDGAR Exhibit)
FT_XML_FORM_SDR_EX_AExhibit A - Controlling Persons
FT_XML_FORM_SDR_EX_BExhibit B - Chief Compliance Off
FT_XML_FORM_SDR_EX_CExhibit C - Director Governors
FT_XML_FORM_SDR_EX_GExhibit G - Affiliates
FT_XML_FORM_SDR_EX_IExhibit I - Service Provider Con
FT_XML_FORM_SDR_EX_TExhibit T - Subscriber Information
FT_XML_FORM_TAForm TA XML (EDGAR, all)
FT_XML_EDGAREDGARLink Online (EDGAR XML)
FT_XML_EDGAR_S16EDGARLink Online (Section 16 Only)
FT_XML_FORM_ABSForm ABS XML (EDGAR)
FT_XML_ABS_AUTOLEASEAuto Lease
FT_XML_ABS_AUTOLOANAuto Loan
FT_XML_ABS_CMBSCommercial Mortgage
FT_XML_ABS_DSDebt Securities
FT_XML_ABS_RMBSResidential Mortgage
FT_XML_ABS_NOTESDisclosure Notes (Ex-103)
FT_XML_REG_ARegulation XML (EDGAR)
FT_NSARNSAR Data (answer.fil)
FT_BITMAPBitmap (CB)
FT_GIFGraphics Interchange Format (CB)
FT_JPEGJPEG Image Format (CB)
FT_PNGPortable Network Graphic (CB)
FT_ZIPZipped/Compressed
FT_GOFILER_PROJECTGoFiler Project File (v 1.x & 2.x)
FT_GOFILER_PROJECT_3XGoFiler Project File (v 3.x)
FT_GFP_3X_ELONormal EDGAR Link Online
FT_GFP_3X_13HForm 13H
FT_GFP_3X_13FForm 13F
FT_GFP_3X_MAForm MA
FT_GFP_3X_SDRForm SDR
FT_GFP_3X_RGARegulation A
FT_GFP_3X_17AForm X-17A-5
FT_GFP_3X_CForm C
FT_GFP_3X_CFPForm CFPORTAL
FT_GFP_3X_17HForm 17H
FT_GFP_3X_TAForm TA
FT_GFP_3X_CENForm N-CEN
FT_GFP_3X_NPTForm N-PORT
FT_GFP_3X_S16Section 16 (Combined)

It is important to note that you can also switch between the codes and strings with the following two functions:


dword = FileTypeStringToCode ( string code );

string = FileTypeCodeToString ( dword code );


These functions simply take the code in one format and change it to the other. This can be useful if you want to be efficient with memory by using the dword codes but then want to use the more human friendly version in a log file.


The last function I want to discuss is a more powerful version of the GetFileTypeCode and GetFileTypeString functions. This is the GetFileTypeData function.


string[] = GetFileTypeData ( string source );


This function works like the other two but instead of returning a code or string it returns an array of properties about the file. This goes beyond the size and modified time but into the file’s meta data (if the application knows how to read it for the file). For example, running this function on a GoFiler project would give you the following properties:



FileTypeCode: 0x00007905
FileTypeString: FT_GOFILER_PROJECT_3X
ExtensionTypeCode: 0x00007904
ExtensionTypeString: FT_GOFILER_PROJECT
TypeDescription: GoFiler Project
FilePath: C:\Users\david.theis\Desktop\XBRL Testing\
FileName: test.gfp
FileSize: 6173
FileCreateTime: 2018-04-10T16:13:48
FileModifiedTime: 2013-11-15T16:28:46
MetaAuthor: David Theis
MetaKeywords: 09-30-2012
MetaSubject: 0000990681
MetaTitle: 10-Q


The properties included the type of the file as well as the creator of the project, the report period, CIK and form type. This information means you can put meta information on a dialog about a user’s chosen file to help them verify it was the proper choice. Additionally, for image files you can get the dimension of the picture.


Now that you know how to check the contents of files be sure to use this knowledge to improve your next script. With Legato, checking files is easier than ever.


 


David Theis has been developing software for Windows operating systems for over fifteen years. He has a Bachelor of Sciences in Computer Science from the Rochester Institute of Technology and co-founded Novaworks in 2006. He is the Vice President of Development and is one of the primary developers of GoFiler, a financial reporting software package designed to create and file EDGAR XML, HTML, and XBRL documents to the U.S. Securities and Exchange Commission.

Additional Resources

Novaworks’ Legato Resources

Legato Script Developers LinkedIn Group

Primer: An Introduction to Legato 



Posted by
David Theis
in Development at 16:12
Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)
No comments
The author does not allow comments to this entry

Quicksearch

Categories

  • XML Accounting
  • XML AICPA News
  • XML FASB News
  • XML GASB News
  • XML IASB News
  • XML Development
  • XML Events
  • XML FERC
  • XML eForms News
  • XML FERC Filing Help
  • XML Filing Technology
  • XML Information Technology
  • XML Investor Education
  • XML MSRB
  • XML EMMA News
  • XML FDTA
  • XML MSRB Filing Help
  • XML Novaworks News
  • XML GoFiler Online Updates
  • XML GoFiler Updates
  • XML XBRLworks Updates
  • XML SEC
  • XML Corporation Finance
  • XML DERA
  • XML EDGAR News
  • XML Investment Management
  • XML SEC Filing Help
  • XML XBRL
  • XML Data Quality Committee
  • XML GRIP Taxonomy
  • XML IFRS Taxonomy
  • XML US GAAP Taxonomy

Calendar

Back May '25 Forward
Mo Tu We Th Fr Sa Su
Sunday, May 18. 2025
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Feeds

  • XML
Sign Up Now
Get SEC news articles and blog posts delivered monthly to your inbox!
Based on the s9y Bulletproof template framework

Compliance

  • FERC
  • EDGAR
  • EMMA

Software

  • GoFiler Suite
  • SEC Exhibit Explorer
  • SEC Extractor
  • XBRLworks
  • Legato Scripting

Company

  • About Novaworks
  • News
  • Site Map
  • Support

Follow Us:

  • LinkedIn
  • YouTube
  • RSS
  • Newsletter
  • © 2024 Novaworks, LLC
  • Privacy
  • Terms of Use
  • Trademarks and Patents
  • Contact Us