• Solutions
    • FERC XBRL Reporting
    • FDTA Financial Reporting
    • SEC Compliance
    • Windows Clipboard Management
    • Legato Scripting
  • Products
    • GoFiler Suite
    • XBRLworks
    • SEC Exhibit Explorer
    • SEC Extractor
    • Clipboard Scout
    • Legato
  • Education
    • Training
    • SEC and EDGAR Compliance
    • Legato Developers
  • Blog
  • Support
  • Skip to blog entries
  • Skip to archive page
  • Skip to right sidebar

Friday, May 17. 2019

LDC #136: Using Legato to Automate Document Conversion

One of the most important features of GoFiler is the ability to take Microsoft Office documents and turn them into EDGAR compliant documents that can be filed to the SEC. While conversion is easy to do manually in GoFiler, there is a function in Legato that can add a crucial step into an automated set of steps that can help make setting up filings a breeze. Today I will go in-depth on this function: ConvertFile.



A picture of a caterpillar transforming into a butterfly

Before we begin I would like to take a second and get some of the legal discussion out of the way. Legato allows you to automate the conversion process. However, this feature is not intended to be used outside of the normal licensing of GoFiler. Use of the conversion features in no way abrogates the per workstation licensing requirements of the application or its components. Use of the conversion SDK functions serving commercial production services, web services, or large scale filing is prohibited under the standard license agreement unless covered by a separate and specific license agreement. Essentially, this is provided as a tool to make your own conversion faster, not as a way to build the conversion into your own conversion software.


With that out of the way, let’s dive into what you can do with the ConvertFile function. This is the syntax:


    int = ConvertFile ( string source, string destination, [handle hLog], [string options] );

This is an extremely flexible function. All it requires is the location of a file to convert and the location where Legato will put the file once it is finished converting the file. The function does all of the heavy lifting for you. It returns an int with either ERROR_NONE or a formatted error code on failure.


The ConvertFile function will use GoFiler’s conversion functions to convert a number of different file types to HTML or ASCII text files. Here is a short list of the options that you can use:


  • CSV to HTML
  • CSV to Text
  • DOC(X) to HTML
  • DOC(X) to Text
  • HTML to HTML
  • RTF to Text
  • XLS(X) to HTML
  • XLS(X) to Text
  • Text to HTML
  • PDF to HTML
  • PPT to HTML

You tell the function which option you are using by giving it two filenames including the extensions. The function does the rest for you, including figuring out what conversion mode is necessary.


The conversion functions in GoFiler may use a number of outside programs under the hood in order to perform the conversion. For example, converting Word or Excel documents into HTML will use OLE automation to open these programs on the computer and get information out of them. This means that in order for the function to work properly, Microsoft Office would have to be installed. If the OLE automation fails an error will be returned and the conversion will fail. An important detail to note is that if the destination string already exists as a file, the function will automatically overwrite the existing file. If this is going to be a problem, you will have to check if the file exists before you convert the file.


The optional parameters are a handle to a log object where conversion details will be put and a string of options for the conversion. If a log is not given to the function, the default log will be used instead. The options string can be a set of property: value; parameters that will override the application’s current conversion settings for the properties that are specified. If you do not need to override any of the conversion settings that are currently set in the application preferences, you can leave this parameter blank. For a full list of properties that can be set please refer to the Legato Script Reference, Chapter 22.1.2: Conversion Options and Parameters.


The final note about the ConvertFile function before I show an example script is that the options are limited by the product that you are using. If the application does not support the conversion mode that is being requested, an ERROR_UNSUPPORTED (0x86000000) will be returned. This will also happen if you pass an incorrect conversion mode to the function. An example of this would be if you ran:


ConvertFile("D:\\Test\\Test.docx", "D:\\Test\\Test.pdf")

Executing this code will return an ERROR_UNSUPPORTED. This is also true if the specific application does not support the conversion. For example, GoXBRL cannot convert PDF to HTML. You will have to be careful in checking the error returned to make sure that you know what has occurred before continuing onward with your script.


Now let’s take a look at an example script that I have put together to show off the function in action. This is a fairly simple script where a user can choose a local folder and the script will convert all .docx files to HTML documents.


/*****************************************************************************************************************

        Conversion Example
        ------------------
        
        Revision:

            05-17-19  JCK       Initial creation

        Notes:
        
            -
                                  (c) 2019 Novaworks, LLC. All Rights Reserved.


*****************************************************************************************************************/
    
    int rc;
    string folder;
    string files[];
    string newname;
    int numfiles;
    int count;

    folder = BrowseFolder("Select Folder to Convert");
    rc = GetLastError();
    files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
    numfiles = ArrayGetAxisDepth(files);
    while (numfiles == 0 && rc == ERROR_NONE) {
      MessageBox("No files found to convert. \r\nSelect a folder with Word documents.");
      folder = BrowseFolder("Select Folder to Convert", folder);
      rc = GetLastError();
      files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
      numfiles = ArrayGetAxisDepth(files);
      }
    
    ProgressOpen("Converting Files");
    count = 0;
    while (count < numfiles) {
      ProgressUpdate(count+1, numfiles);
      ProgressSetStatus(1, "Converting file %d of %d", count+1, numfiles);
      ProgressSetStatus(2, "%s", files[count]);
      newname = ReplaceInString(files[count], ".docx", ".htm", false);
      if (newname != "") {
        rc = ConvertFile(folder+files[count], folder+newname);
        if (rc == ERROR_NONE) {
          AddMessage("File %s successfully converted.", files[count]);
          }
        else {
          AddMessage("File %s failed to convert with error code 0x%08X.", files[count], rc);
          }
        }
      count++;
      }
    ProgressClose();

This script can be broken up into two halves: getting the folder from the user and converting all of the documents. As always, we start the script by declaring some variables and then we do this:


    folder = BrowseFolder("Select Folder to Convert");
    rc = GetLastError();
    files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
    numfiles = ArrayGetAxisDepth(files);
    while (numfiles == 0 && rc == ERROR_NONE) {
      MessageBox("No files found to convert. \r\nSelect a folder with Word documents.");
      folder = BrowseFolder("Select Folder to Convert", folder);
      rc = GetLastError();
      files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
      numfiles = ArrayGetAxisDepth(files);
      }

We ask the user for a folder in which we can search for files to convert. When one is selected, we enumerate the folder searching for any files that have a “.docx” filetype using the EnumerateFiles function, and we then see how many of those files are found. If the number is zero and the user clicked on the “OK” button (if the user clicks on “Cancel” rc will be ERROR_CANCEL), we show the user a message box asking the user to select another folder. We can display the same dialog again for the user to do that, this time opening to the location that was selected last time, so if the user accidentally clicked on a folder one level above where he or she meant to, it is easy to select the correct location. This loop will continue to run until a valid folder is selected or the user cancels the dialog. We then take the array of files we found for conversion use it as a basis for our next loop:


    ProgressOpen("Converting Files");
    count = 0;
    while (count < numfiles) {
      ProgressUpdate(count+1, numfiles);
      ProgressSetStatus(1, "Converting file %d of %d", count+1, numfiles);
      ProgressSetStatus(2, "%s", files[count]);
      newname = ReplaceInString(files[count], ".docx", ".htm", false);
      if (newname != "") {
        rc = ConvertFile(folder+files[count], folder+newname);
        if (rc == ERROR_NONE) {
          AddMessage("File %s successfully converted.", files[count]);
          }
        else {
          AddMessage("File %s failed to convert with error code 0x%08X.", files[count], rc);
          }
        }
      count++;
      }
    ProgressClose();

We open a progress box with the ProgressOpen function as conversions can take some time and we do not want to leave the user hanging. We then enter a loop going through each file in the array. At the beginning of each time through the loop, we update the progress bar with the ProgressUpdate function and set the status with the ProgressSetStatus function to report not only how far we are through the process but also the current filename being converted. Files can take a longer or shorter time to convert depending on the size and complexity, so we can provide the user with exact information about which files are taking up processing cycles.


Our next step is to figure out what the new name of the file should be. In this case, since we are converting from Word Document to HTML, we can search for “.docx” in the name string and replace it with “.htm” with the ReplaceInString function. We also make sure that this search is case-insensitive so that we don’t miss a file. After that, we do our conversion with the ConvertFile function. If our conversion returns anything other than ERROR_NONE, we note that in the log. We then repeat this code for every file that we found in the folder, finish up our loop, and close our progress window. The default log will be shown after the script ends.


You’ll notice that this script does not include a main or a hook like a lot of our examples. In this case I was writing an example that could be used a separate function and included as part of a larger program, like a step in a function for starting someone’s day before they clean up all of these newly converted documents.


Converting files is one of GoFiler’s most powerful features. Respecting the rules of the EDGAR system is no easy task, but the conversion tools offered make it easy to do, and using the ConvertFile function in Legato allows you to easily integrate this conversion into your own personal steps for creating filings and otherwise modifying files.


 


Joshua Kwiatkowski is a developer at Novaworks, primarily working on Novaworks’ cloud-based solution, GoFiler Online. He is a graduate of the Rochester Institute of Technology with a Bachelor of Science degree in Game Design and Development. He has been with the company since 2013.

Additional Resources

Novaworks’ Legato Resources

Legato Script Developers LinkedIn Group

Primer: An Introduction to Legato 

Posted by
Joshua Kwiatkowski
in Development at 17:12
Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)
No comments
Add Comment
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

 
   
 

Quicksearch

Categories

  • XML Accounting
  • XML AICPA News
  • XML FASB News
  • XML GASB News
  • XML IASB News
  • XML Development
  • XML Events
  • XML FERC
  • XML eForms News
  • XML FERC Filing Help
  • XML Filing Technology
  • XML Information Technology
  • XML Investor Education
  • XML MSRB
  • XML EMMA News
  • XML FDTA
  • XML MSRB Filing Help
  • XML Novaworks News
  • XML GoFiler Online Updates
  • XML GoFiler Updates
  • XML XBRLworks Updates
  • XML SEC
  • XML Corporation Finance
  • XML DERA
  • XML EDGAR News
  • XML Investment Management
  • XML SEC Filing Help
  • XML XBRL
  • XML Data Quality Committee
  • XML GRIP Taxonomy
  • XML IFRS Taxonomy
  • XML US GAAP Taxonomy

Calendar

Back May '25 Forward
Mo Tu We Th Fr Sa Su
Sunday, May 18. 2025
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Feeds

  • XML
Sign Up Now
Get SEC news articles and blog posts delivered monthly to your inbox!
Based on the s9y Bulletproof template framework

Compliance

  • FERC
  • EDGAR
  • EMMA

Software

  • GoFiler Suite
  • SEC Exhibit Explorer
  • SEC Extractor
  • XBRLworks
  • Legato Scripting

Company

  • About Novaworks
  • News
  • Site Map
  • Support

Follow Us:

  • LinkedIn
  • YouTube
  • RSS
  • Newsletter
  • © 2024 Novaworks, LLC
  • Privacy
  • Terms of Use
  • Trademarks and Patents
  • Contact Us