• Solutions
    • FERC XBRL Reporting
    • FDTA Financial Reporting
    • SEC Compliance
    • Windows Clipboard Management
    • Legato Scripting
  • Products
    • GoFiler Suite
    • XBRLworks
    • SEC Exhibit Explorer
    • SEC Extractor
    • Clipboard Scout
    • Legato
  • Education
    • Training
    • SEC and EDGAR Compliance
    • Legato Developers
  • Blog
  • Support
  • Skip to blog entries
  • Skip to archive page
  • Skip to right sidebar

Friday, March 29. 2019

LDC #129: Building a Section 16 Data Scraper, Part 3: Analysis

It’s time to finish up our base script for our Section 16 data scraper. In the first two parts of this series we started a script that would go out to the SEC’s website, download all of the filings from a particular CIK, cache all of the Section 16 filings, find all of the reported transactions, and put those transactions into a CSV file. This week we are going to finish up the base functionality by reading through all of the transactions and finding the current holdings. Additionally we are going to perform a large amount of refactoring as well as add in the functionality to look at reported holdings.


Let’s first talk about some of the refactoring that we will be doing this week. To refactor our code we are going to look ahead to the future. What are our plans for the scraper? My plans for this script is to use it to populate a dialog control so that a user can specify a CIK and behind the scenes our script retrieves all the information. That way we can dynamically show the user the information that they want. Using the script in this way makes it even easier for a user to get someone’s latest holdings.


Data analysisThe easiest way to prepare for something like that is to split out each part of our large main function into its own functions. This way later on we can have the dialog call each function as needed - as we need more information we can retrieve more information. That means our final script today has a main function, but the main function is little more than a test jig now that calls each function and then prints out our findings.


Here’s this week’s final script:


/*****************************************************************************************************************

        Section 16 Data Collector
        -------------------------
        
        Revision:

            01-04-19  JCK       Initial creation - get filings from SEC
            02-08-19  JCK       Part 2 - Store information
            03-29-19  JCK       Part 3 - Refactoring and getting holdings

        Notes:
        
            -Stage 3
                                  (c) 2019 Novaworks, LLC. All Rights Reserved.


*****************************************************************************************************************/

#define TEST_CIK        "0001091423" //Robert Kotick 
#define COL_TORH        0
#define COL_AORD        1
#define COL_TDATE       2
#define COL_TTYPE       3
#define COL_DORI        4
#define COL_NUMTRANS    5
#define COL_NUMOWNED    6
#define COL_CIK         7
#define COL_NAME        8

    handle              hSheet;
    handle              file;
    string              s16filings[];                   //Table of all filings
    string              s16holdings[][];                //Table of current holdings

    boolean             is_section_16           (string type);
    void                add_transaction         (string issuer);
    void                add_holding             (string issuer, string date);
    int                 populate_filings        (string cache, string c_cik);
    void                get_current_holdings    (string cache);
    void                parse_filings           (string cache, string c_cik);
    
    void main() { 
    
      //creation variables
      string            cache;                          //Location of cache
      string            c_cik;                          //CIK of Reporting Owner
      int               count;                          //Counting Variable
      int               max;
      
      c_cik = TEST_CIK;
      //Find the location for our cache
      cache = GetScriptFolder() + "Cache\\" + c_cik;
      //create cache
      CreateFolders(cache);
      if (IsFolder(cache) == FALSE) {                                   
        AddMessage('x', "Unable to create document cache.");    
        return;
        }                                                                       
      //Populate filings array based on SEC's data
      populate_filings(cache, c_cik);
      //Parse through filings to find all transactions
      parse_filings(cache, c_cik);
      //Parse through transactions and holdings to find current number
      get_current_holdings(AddPaths(cache, c_cik + ".csv"));
      
      count = 0;
      max = ArrayGetAxisDepth(s16holdings, AXIS_ROW);
      int printed;
      printed = 1;
      //Print out all of our current holdings
      while (count < max) {
        if (s16holdings[count][COL_NUMOWNED] != "0") {
          AddMessage("%-11s %02d: CIK: %08s Name: %-50s Securities Owned: %-10s Last Reported: %10s", s16holdings[count][COL_TORH], printed++, s16holdings[count][COL_CIK], s16holdings[count][COL_NAME], s16holdings[count][COL_NUMOWNED], s16holdings[count][COL_TDATE]);
          }
        count++;
        }

      }
    
    //Returns true if S16 file type, false if anything else
    boolean is_section_16(string type) {
    
      switch (type) {
        case "3":
          return true;
        case "4":
          return true;
        case "5":
          return true;
        case "3/A":
          return true;
        case "4/A":
          return true;
        case "5/A":
          return true;
        }
      return false;
      }
    
    void parse_filings(string cache, string c_cik) {
      int               count, max, rc;
      string            date, cik;
      string            parser;
      
      //Get number of filings
      count = ArrayGetAxisDepth(s16filings) - 1;
      //Create data sheet
      hSheet = DataSheetCreate(2, 9);
      //Set the header row
      DataSheetSetCellText(hSheet, 0, COL_TORH, "Transaction or Holding");
      DataSheetSetCellText(hSheet, 0, COL_AORD, "Acquisition or Disposition");
      DataSheetSetCellText(hSheet, 0, COL_TDATE, "Transaction Date");
      DataSheetSetCellText(hSheet, 0, COL_TTYPE, "Transaction Type");
      DataSheetSetCellText(hSheet, 0, COL_DORI, "Direct or Indirect Ownership");
      DataSheetSetCellText(hSheet, 0, COL_NUMTRANS, "Number of Securities Transacted");
      DataSheetSetCellText(hSheet, 0, COL_NUMOWNED, "Number of Securities Owned");
      DataSheetSetCellText(hSheet, 0, COL_CIK, "Issuer CIK");
      DataSheetSetCellText(hSheet, 0, COL_NAME, "Security Name");
      
       //Open progress bar
      ProgressOpen("Reading Filings");
      ProgressSetPhaseCount(count);
      max = count;
      
      //Go through filings
      while (count >= 0) {
        //Update progress
        ProgressSetPhase(max-count);
        ProgressSetStatus(FormatString("Reading file %d of %d", max-count, max));
        ProgressUpdate(max-count, max);
        //AddMessage("Parsing file %s", s16filings[count]);
        date = "";
        //Create SGML parser
        file = SGMLCreate();
        //Set parse to this file
        rc = SGMLSetFile(file, s16filings[count]);
        //Check for errors
        if (rc != ERROR_NONE) {
          AddMessage("Error on Setting File %s: 0x%08X", s16filings[count], rc);
          }
        //Get first element
        parser = SGMLNextElement(file);
        rc = GetLastError();
        //Check for issuer CIK
        while (parser != "<periodOfReport>" && rc == ERROR_NONE) {
          parser = SGMLNextElement(file);
          rc = GetLastError();
          }
        //Get CIK
        date = SGMLNextItem(file);
        //Check for issuer CIK
        while (parser != "<issuerCik>" && rc == ERROR_NONE) {
          parser = SGMLNextElement(file);
          rc = GetLastError();
          }
        //Get CIK
        cik = SGMLNextItem(file);
        if (rc != ERROR_NONE) {
          AddMessage("Error on retrieving CIK: 0x%08X", rc);
          }
        
        //Until end of file
        while (rc == ERROR_NONE) {
          //check if element is beginning of transaction
          if (parser == "<nonDerivativeTransaction>" || parser == "<derivativeTransaction>") {
            //Add transaction
            add_transaction(cik);
            }
          if (parser == "<nonDerivativeHolding>" || parser == "<derivativeHolding>") {
            //Add transaction
            add_holding(cik, date);
            }
          
          //Next element
          parser = SGMLNextElement(file);
          rc = GetLastError();
          }
        //Done with SGML parser
        CloseHandle(file);
        //Next file
        count--;
        }
      
      //Export data to CSV
      rc = DataSheetExport(hSheet, AddPaths(cache, c_cik + ".csv"));
      if (rc != ERROR_NONE) {
        AddMessage("Error on Creating CSV: 0x%08X", rc);
        }
      CloseHandle(hSheet);
      //AddMessage("Finished Parsing files from CIK: %s", TEST_CIK);
    }
    
    void add_transaction(string issuer) {
    
      string            aord, date, ttype, dori, numtrans, numowned, secname;
      string            parser;
      int               rc;
      
      //Initialize variables
      aord = "";
      date = "";  
      ttype = "";   
      dori = "";
      numtrans = "";
      numowned = "";  
      secname = ""; 
      
      //Get next element
      parser = SGMLNextElement(file);
      //Check for end of the transaction
      while (parser != "</nonDerivativeTransaction>" && parser != "</derivativeTransaction>" && rc == ERROR_NONE){
        if (parser == "<securityTitle>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          secname = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            secname += SGMLNextItem(file);
            }
          secname = ReplaceInString(secname, "</value>", "");
          }
        if (parser == "<transactionDate>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          date = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            date += SGMLNextItem(file);
            }
          date = ReplaceInString(date, "</value>", "");
          }
        if (parser == "<transactionCode>"){
          ttype = SGMLNextItem(file);
        }
        if (parser == "<transactionShares>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          numtrans = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            numtrans += SGMLNextItem(file);
            }
          numtrans = ReplaceInString(numtrans, "</value>", "");
          }
        if (parser == "<transactionAcquiredDisposedCode>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          aord = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            aord += SGMLNextItem(file);
            }
          aord = ReplaceInString(aord, "</value>", "");
          }
        if (parser == "<sharesOwnedFollowingTransaction>"){ 
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          numowned = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            numowned += SGMLNextItem(file);
            }
          numowned = ReplaceInString(numowned, "</value>", "");
          }
        if (parser == "<directOrIndirectOwnership>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          dori = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            dori += SGMLNextItem(file);
            }
          dori = ReplaceInString(dori, "</value>", "");
          }
        parser = SGMLNextElement(file);
        rc = GetLastError();
        }
      //Insert a row underneath the header row
      DataSheetRowInsert(hSheet, 1);
      //Insert all of the data
      DataSheetSetCellText(hSheet, 1, COL_TORH, "Transaction"); 
      DataSheetSetCellText(hSheet, 1, COL_AORD, aord); 
      DataSheetSetCellText(hSheet, 1, COL_TDATE, date);
      DataSheetSetCellText(hSheet, 1, COL_TTYPE, ttype);
      DataSheetSetCellText(hSheet, 1, COL_DORI, dori);
      DataSheetSetCellText(hSheet, 1, COL_NUMTRANS, numtrans);
      DataSheetSetCellText(hSheet, 1, COL_NUMOWNED, numowned);
      DataSheetSetCellText(hSheet, 1, COL_CIK, issuer);
      DataSheetSetCellText(hSheet, 1, COL_NAME, secname);
      }

    void add_holding(string issuer, string date) {
    
      string            aord, ttype, dori, numtrans, numowned, secname;
      string            parser;
      int               rc;
      
      //Initialize variables
      dori = "";
      numowned = "";  
      secname = ""; 
      
      //Get next element
      parser = SGMLNextElement(file);
      //Check for end of the transaction
      while (parser != "</nonDerivativeHolding>" && parser != "</derivativeHolding>" && rc == ERROR_NONE){
        if (parser == "<securityTitle>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          secname = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            secname += SGMLNextItem(file);
            }
          secname = ReplaceInString(secname, "</value>", "");
          }
        if (parser == "<sharesOwnedFollowingTransaction>"){ 
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          numowned = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            numowned += SGMLNextItem(file);
            }
          numowned = ReplaceInString(numowned, "</value>", "");
          }
        if (parser == "<directOrIndirectOwnership>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          dori = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            dori += SGMLNextItem(file);
            }
          dori = ReplaceInString(dori, "</value>", "");
          }
        parser = SGMLNextElement(file);
        rc = GetLastError();
        }
      //Insert a row underneath the header row
      DataSheetRowInsert(hSheet, 1);
      //Insert all of the data
      DataSheetSetCellText(hSheet, 1, COL_TORH, "Holding"); 
      DataSheetSetCellText(hSheet, 1, COL_TDATE, date);
      DataSheetSetCellText(hSheet, 1, COL_DORI, dori);
      DataSheetSetCellText(hSheet, 1, COL_NUMOWNED, numowned);
      DataSheetSetCellText(hSheet, 1, COL_CIK, issuer);
      DataSheetSetCellText(hSheet, 1, COL_NAME, secname);
      }
      
    int populate_filings(string cache, string c_cik) {
      //creation variables
      string            files[];                        //Files from SEC
      string            name;                           //Name of file
      boolean           s16Check;                       //Placeholder boolean
      int               rc;                             //Error Checker
      string            base;                           //String for name storage
      int               count;
      int               numfiles;
      int               max;
      
      //Get the archive list   
      files = EDGARFetchArchiveList(c_cik);
      rc = GetLastError();
      //Check is CIK exists
      if (rc == (ERROR_REMOTE | ERROR_FILE_NOT_FOUND)) {
        AddMessage("CIK Does Not Exist");
        return;
        }
      //If too many filings
      if (rc == ERROR_OVERFLOW) {
        files = EDGARFetchArchiveList(c_cik, FALSE, 0, 0, TRUE);
        }
      
      //Initialize needed variables                                                                     
      max = ArrayGetAxisDepth(files);
      count = 0;
      numfiles = 0;
      
      //Open progress bar
      ProgressOpen("Getting Files");
      ProgressSetPhaseCount(max-1);
      
      //While there are still files to go through
      while (count < max) {
        //Update progress
        ProgressSetPhase(count);
        ProgressSetStatus(FormatString("Getting file %d of %d", count, max));
        ProgressUpdate(count, max);
        
        //Get filename
        base = ReplaceInString(GetFilename(files[count]), ".txt", ".xml");
        //Check if already downloaded
        if (DoesFileExist(AddPaths(cache, base)) == FALSE) {
          //Open archive
          file = EDGARArchiveOpen(files[count]);
          //Make sure a S16 filing
          s16Check = is_section_16(EDGARArchiveGetDocType(file, 0));
          //If s16
          if (s16Check) { 
            if (EDGARArchiveGetDocFileType(file, 0) == FT_XML) {
              //Set name of XML file
              name = AddPaths(cache, EDGARArchiveGetProperty(file, "accession_number") + ".xml");
              //Get XML file and save it locally
              EDGARArchiveGetDocFile(file, 0, name);
              //Add to the list
              s16filings[numfiles++] = name;
              }
            }
          CloseHandle(file);
          }
        else {
          //Add cached file to list
          s16filings[numfiles++] = AddPaths(cache, base);
          }
        count++;
        }
      //Close progress
      ProgressClose();
      
      return ERROR_NONE;
      }
      
    //Get current holdings
    void get_current_holdings(string cache) {
      int               count;
      int               numholdings;
      int               countholdings;
      int               max;
      int               intcount;
      string            aord, date, ttype, dori, numtrans, numowned, secname, cik, type;
      string            filings[][];
      boolean           found;
      //Clear the array
      ArrayClear(s16holdings);
      //Get all transactions from filings
      filings = CSVReadTable(cache);
      //Set a counter
      max = ArrayGetAxisDepth(filings, AXIS_ROW);
      count = 1;
      //Currently at 0 holdings
      numholdings = 0;
      //Go through all transactions
      while (count < max-1) {
        //get all of the values
        type =     filings[count][COL_TORH];
        aord =     filings[count][COL_AORD];
        date =     filings[count][COL_TDATE];   
        ttype =    filings[count][COL_TTYPE];   
        dori =     filings[count][COL_DORI];    
        numtrans = filings[count][COL_NUMTRANS];
        numowned = filings[count][COL_NUMOWNED];
        cik =      filings[count][COL_CIK];
        secname =  filings[count][COL_NAME];
        //set counter to 0
        countholdings = 0;
        //Not found in current holdings
        found = false;
        //Go through current holdings
        while (countholdings < numholdings && found == false) { 
          //Same CIK?
          if (s16holdings[countholdings][COL_CIK] == cik) {
            //Same security name?
            if (CompareStringsNoCase(s16holdings[countholdings][COL_NAME], secname) == 0 ) {
              //Found already reported
              found = true;
              }
            }
          countholdings++;
          }
        //If not reported yet
        if (found == false) {
          //Set holding 
          s16holdings[numholdings][COL_TORH] = type;
          s16holdings[numholdings][COL_AORD] = aord;
          s16holdings[numholdings][COL_TDATE] = date;  
          s16holdings[numholdings][COL_TTYPE] = ttype;
          s16holdings[numholdings][COL_DORI] = dori;  
          s16holdings[numholdings][COL_NUMTRANS] = numtrans;
          s16holdings[numholdings][COL_NUMOWNED] = numowned;
          s16holdings[numholdings][COL_CIK] = cik;
          s16holdings[numholdings][COL_NAME] = secname;
          numholdings++;
          }
        count++;
        }
      }

Note that this script is getting to be rather large (almost 500 lines of code now). We’re not going to go through every single line, but I’m going to highlight some of the changes that have been made and I’ll talk in detail about the brand new code in here.


    void main() { 
    
      //creation variables
      string            cache;                          //Location of cache
      string            c_cik;                          //CIK of Reporting Owner
      int               count;                          //Counting Variable
      int               max;
      
      c_cik = TEST_CIK;
      //Find the location for our cache
      cache = GetScriptFolder() + "Cache\\" + c_cik;
      //create cache
      CreateFolders(cache);
      if (IsFolder(cache) == FALSE) {                                   
        AddMessage('x', "Unable to create document cache.");    
        return;
        }                                                                       
      //Populate filings array based on SEC's data
      populate_filings(cache, c_cik);
      //Parse through filings to find all transactions
      parse_filings(cache, c_cik);
      //Parse through transactions and holdings to find current number
      get_current_holdings(AddPaths(cache, c_cik + ".csv"));
      
      count = 0;
      max = ArrayGetAxisDepth(s16holdings, AXIS_ROW);
      int printed;
      printed = 1;
      //Print out all of our current holdings
      while (count < max) {
        if (s16holdings[count][COL_NUMOWNED] != "0") {
          AddMessage("%-11s %02d: CIK: %08s Name: %-50s Securities Owned: %-10s Last Reported: %10s", s16holdings[count][COL_TORH], printed++, s16holdings[count][COL_CIK], s16holdings[count][COL_NAME], s16holdings[count][COL_NUMOWNED], s16holdings[count][COL_TDATE]);
          }
        count++;
        }

      }

Our main() function is now very simple. We take our testing CIK and create a cache folder. We then call our three main action functions: populate_filings(), parse_filings(), and get_current_holdings(). Finally we go through each holding and print them out to the screen. If you were to run this script as-is you would get this:


Holding     01: CIK: 0000718877 Name: Common Stock, par value $0.000001 per share        Securities Owned: 221288     Last Reported: 2018-12-28
Transaction 02: CIK: 0000021344 Name: Phantom Stock Units                                Securities Owned: 43003.2605 Last Reported: 2018-03-29
Holding     03: CIK: 0000021344 Name: Common Stock, $.25 Par Value                       Securities Owned: 18         Last Reported: 2018-03-29
Transaction 04: CIK: 0000718877 Name: Employee Stock Options                             Securities Owned: 190712     Last Reported: 2017-08-07
Holding     05: CIK: 0000718877 Name: Common Stock, par value $0.000001                  Securities Owned: 9600       Last Reported: 2010-06-02
Holding     06: CIK: 0000718877 Name: Common Stock, par value $0.0000001 per share       Securities Owned: 160610     Last Reported: 2009-11-13
Transaction 07: CIK: 0000718877 Name: Common Stock, par value $0.00001 per share         Securities Owned: 4475288    Last Reported: 2007-12-24
Transaction 08: CIK: 0001011006 Name: Director Stock Option                              Securities Owned: 322        Last Reported: 2007-12-31
Transaction 09: CIK: 0001011006 Name: Common Stock                                       Securities Owned: 10000      Last Reported: 2007-06-12
Transaction 10: CIK: 0001011006 Name: Stock Option ( right to buy )                      Securities Owned: 119741     Last Reported: 2006-04-21
Transaction 11: CIK: 0000718877 Name: Common Stock, par value $.000001 per share         Securities Owned: 3397246    Last Reported: 2005-09-15
Transaction 12: CIK: 0000913949 Name: Non-Qualified Stock Option (right to buy           Securities Owned: 33332      Last Reported: 2003-09-04
Transaction 13: CIK: 0000913949 Name: Non-Qualified Stock Option (right to buy)          Securities Owned: 7500       Last Reported: 2003-07-24

This is the last reported value from each combination of issuer CIK and security name in the entire history of the reporting owner’s CIK. The data can be a little messy, and we will look at cleaning this up in the next part. Likely, we could filter out any holdings older than a year as holdings are expected to be reported once a year on a form 5. However, as I discovered in my testing of this script, not everyone is filing all of their required filings, so I kept this unfiltered for now. Now let’s quickly look at some of the major highlights of each function, starting with populate_filings().


    int populate_filings(string cache, string c_cik) {
      //creation variables
      string            files[];                        //Files from SEC
      string            name;                           //Name of file
      boolean           s16Check;                       //Placeholder boolean
      int               rc;                             //Error Checker
      string            base;                           //String for name storage
      int               count;
      int               numfiles;
      int               max;
      
      //Get the archive list   
      files = EDGARFetchArchiveList(c_cik);
      rc = GetLastError();
      //Check is CIK exists
      if (rc == (ERROR_REMOTE | ERROR_FILE_NOT_FOUND)) {
        AddMessage("CIK Does Not Exist");
        return;
        }
      //If too many filings
      if (rc == ERROR_OVERFLOW) {
        files = EDGARFetchArchiveList(c_cik, FALSE, 0, 0, TRUE);
        }
      
      //Initialize needed variables                                                                     
      max = ArrayGetAxisDepth(files);
      count = 0;
      numfiles = 0;
      
      //Open progress bar
      ProgressOpen("Getting Files");
      ProgressSetPhaseCount(max-1);
      
      //While there are still files to go through
      while (count < max) {
        //Update progress
        ProgressSetPhase(count);
        ProgressSetStatus(FormatString("Getting file %d of %d", count, max));
        ProgressUpdate(count, max);
        
        //Get filename
        base = ReplaceInString(GetFilename(files[count]), ".txt", ".xml");
        //Check if already downloaded
        if (DoesFileExist(AddPaths(cache, base)) == FALSE) {
          //Open archive
          file = EDGARArchiveOpen(files[count]);
          //Make sure a S16 filing
          s16Check = is_section_16(EDGARArchiveGetDocType(file, 0));
          //If s16
          if (s16Check) { 
            if (EDGARArchiveGetDocFileType(file, 0) == FT_XML) {
              //Set name of XML file
              name = AddPaths(cache, EDGARArchiveGetProperty(file, "accession_number") + ".xml");
              //Get XML file and save it locally
              EDGARArchiveGetDocFile(file, 0, name);
              //Add to the list
              s16filings[numfiles++] = name;
              }
            }
          CloseHandle(file);
          }
        else {
          //Add cached file to list
          s16filings[numfiles++] = AddPaths(cache, base);
          }
        count++;
        }
      //Close progress
      ProgressClose();
      
      return ERROR_NONE;
      }

This code is basically unchanged from last week, other than the fact that we have put it into a function. There’s one change I would like to call your attention to:


          //If s16
          if (s16Check) { 
            if (EDGARArchiveGetDocFileType(file, 0) == FT_XML) {
              //Set name of XML file
              name = AddPaths(cache, EDGARArchiveGetProperty(file, "accession_number") + ".xml");
              //Get XML file and save it locally
              EDGARArchiveGetDocFile(file, 0, name);
              //Add to the list
              s16filings[numfiles++] = name;
              }

The changed line in here is the addition of EDGARArchiveGetDocFileType(). We use this to check to see if the Section 16 document is XML or not. In my initial version of the script I never tested a CIK that had filings from before Section 16 filings were XML based. Since the switch occurred around the year 2003 I decided that if the filing was not XML I did not care about the information inside of it. However, without this line there was a plethora of parsing errors later on that were introduced because older filings did not have the expected XML data inside of them.


    void parse_filings(string cache, string c_cik) {
      int               count, max, rc;
      string            date, cik;
      string            parser;
      
      //Get number of filings
      count = ArrayGetAxisDepth(s16filings) - 1;
      //Create data sheet
      hSheet = DataSheetCreate(2, 9);
      //Set the header row
      DataSheetSetCellText(hSheet, 0, COL_TORH, "Transaction or Holding");
      DataSheetSetCellText(hSheet, 0, COL_AORD, "Acquisition or Disposition");
      DataSheetSetCellText(hSheet, 0, COL_TDATE, "Transaction Date");
      DataSheetSetCellText(hSheet, 0, COL_TTYPE, "Transaction Type");
      DataSheetSetCellText(hSheet, 0, COL_DORI, "Direct or Indirect Ownership");
      DataSheetSetCellText(hSheet, 0, COL_NUMTRANS, "Number of Securities Transacted");
      DataSheetSetCellText(hSheet, 0, COL_NUMOWNED, "Number of Securities Owned");
      DataSheetSetCellText(hSheet, 0, COL_CIK, "Issuer CIK");
      DataSheetSetCellText(hSheet, 0, COL_NAME, "Security Name");
      
       //Open progress bar
      ProgressOpen("Reading Filings");
      ProgressSetPhaseCount(count);
      max = count;
      
      //Go through filings
      while (count >= 0) {
        //Update progress
        ProgressSetPhase(max-count);
        ProgressSetStatus(FormatString("Reading file %d of %d", max-count, max));
        ProgressUpdate(max-count, max);
        //AddMessage("Parsing file %s", s16filings[count]);
        date = "";
        //Create SGML parser
        file = SGMLCreate();
        //Set parse to this file
        rc = SGMLSetFile(file, s16filings[count]);
        //Check for errors
        if (rc != ERROR_NONE) {
          AddMessage("Error on Setting File %s: 0x%08X", s16filings[count], rc);
          }
        //Get first element
        parser = SGMLNextElement(file);
        rc = GetLastError();
        //Check for issuer CIK
        while (parser != "<periodOfReport>" && rc == ERROR_NONE) {
          parser = SGMLNextElement(file);
          rc = GetLastError();
          }
        //Get CIK
        date = SGMLNextItem(file);
        //Check for issuer CIK
        while (parser != "<issuerCik>" && rc == ERROR_NONE) {
          parser = SGMLNextElement(file);
          rc = GetLastError();
          }
        //Get CIK
        cik = SGMLNextItem(file);
        if (rc != ERROR_NONE) {
          AddMessage("Error on retrieving CIK: 0x%08X", rc);
          }
        
        //Until end of file
        while (rc == ERROR_NONE) {
          //check if element is beginning of transaction
          if (parser == "<nonDerivativeTransaction>" || parser == "<derivativeTransaction>") {
            //Add transaction
            add_transaction(cik);
            }
          if (parser == "<nonDerivativeHolding>" || parser == "<derivativeHolding>") {
            //Add transaction
            add_holding(cik, date);
            }
          
          //Next element
          parser = SGMLNextElement(file);
          rc = GetLastError();
          }
        //Done with SGML parser
        CloseHandle(file);
        //Next file
        count--;
        }
      
      //Export data to CSV
      rc = DataSheetExport(hSheet, AddPaths(cache, c_cik + ".csv"));
      if (rc != ERROR_NONE) {
        AddMessage("Error on Creating CSV: 0x%08X", rc);
        }
      CloseHandle(hSheet);
      //AddMessage("Finished Parsing files from CIK: %s", TEST_CIK);
    }

The function parse_filings() has some new functionality from last time. Our previous version of the script only looked for transactions in filings. Now we are looking for and recording holding rows as well as transactions. As a result we now have two functions we can call from parse_filings(): add_transaction() and add_holding(). These functions are similar to each other, but the tags they look for are different as different information is contained within transactions and holdings. Another difference from last time is that we now have another column to store whether rows are transactions or holdings. This column in the CSV is the first column in the list, and so we have incremented the number on all of the columns that existed previously. Finally, since holdings are not guaranteed to have a date listed with them we find the period of the report, and if we are adding a holding we use that date to record the holding.


    void add_holding(string issuer, string date) {
    
      string            aord, ttype, dori, numtrans, numowned, secname;
      string            parser;
      int               rc;
      
      //Initialize variables
      dori = "";
      numowned = "";  
      secname = ""; 
      
      //Get next element
      parser = SGMLNextElement(file);
      //Check for end of the transaction
      while (parser != "</nonDerivativeHolding>" && parser != "</derivativeHolding>" && rc == ERROR_NONE){
        if (parser == "<securityTitle>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          secname = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            secname += SGMLNextItem(file);
            }
          secname = ReplaceInString(secname, "</value>", "");
          }
        if (parser == "<sharesOwnedFollowingTransaction>"){ 
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          numowned = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            numowned += SGMLNextItem(file);
            }
          numowned = ReplaceInString(numowned, "</value>", "");
          }
        if (parser == "<directOrIndirectOwnership>"){
          while (parser != "<value>") {
            parser = SGMLNextItem(file);
            }
          dori = SGMLNextItem(file);
          while (SGMLGetItemType(file) != SPI_TYPE_TAG){
            dori += SGMLNextItem(file);
            }
          dori = ReplaceInString(dori, "</value>", "");
          }
        parser = SGMLNextElement(file);
        rc = GetLastError();
        }
      //Insert a row underneath the header row
      DataSheetRowInsert(hSheet, 1);
      //Insert all of the data
      DataSheetSetCellText(hSheet, 1, COL_TORH, "Holding"); 
      DataSheetSetCellText(hSheet, 1, COL_TDATE, date);
      DataSheetSetCellText(hSheet, 1, COL_DORI, dori);
      DataSheetSetCellText(hSheet, 1, COL_NUMOWNED, numowned);
      DataSheetSetCellText(hSheet, 1, COL_CIK, issuer);
      DataSheetSetCellText(hSheet, 1, COL_NAME, secname);
      }

This code looks almost identical to the add_transaction() code from last time, and that is because it uses the same logic. However, the items that it looks for are different. Holdings contain a great deal less information that transactions, so we instead only care about the number of securities owned, the issuer, and the name of the security. The date is not listed in a row, so we get the date out of the period of report, passed to the function when it is called in parse_filings(). The add_transaction() function is unchanged from last week.


    //Get current holdings
    void get_current_holdings(string cache) {
      int               count;
      int               numholdings;
      int               countholdings;
      int               max;
      int               intcount;
      string            aord, date, ttype, dori, numtrans, numowned, secname, cik, type;
      string            filings[][];
      boolean           found;
      //Clear the array
      ArrayClear(s16holdings);
      //Get all transactions from filings
      filings = CSVReadTable(cache);
      //Set a counter
      max = ArrayGetAxisDepth(filings, AXIS_ROW);
      count = 1;
      //Currently at 0 holdings
      numholdings = 0;

Here is the beginning of our new function. We get passed the location of the cache, and then we get everything ready to find our current holdings. We initialize a bunch of variables. I decided to read all of the values from each row into separate variables. I want to quickly touch on why I did it this way and that is because it is my preference to make code as readable as possible. Since we end up comparing multiple table entries to each other I decided this coding method was easier to read and made it easier to verify I was doing the correct thing. While this ends up being less efficient, the peace of mind of being able to easily understand what is going on makes this decision worth it.


      //Go through all transactions
      while (count < max-1) {
        //get all of the values
        type =     filings[count][COL_TORH];
        aord =     filings[count][COL_AORD];
        date =     filings[count][COL_TDATE];   
        ttype =    filings[count][COL_TTYPE];   
        dori =     filings[count][COL_DORI];    
        numtrans = filings[count][COL_NUMTRANS];
        numowned = filings[count][COL_NUMOWNED];
        cik =      filings[count][COL_CIK];
        secname =  filings[count][COL_NAME];
        //set counter to 0
        countholdings = 0;
        //Not found in current holdings
        found = false;
        //Go through current holdings
        while (countholdings < numholdings && found == false) { 
          //Same CIK?
          if (s16holdings[countholdings][COL_CIK] == cik) {
            //Same security name?
            if (CompareStringsNoCase(s16holdings[countholdings][COL_NAME], secname) == 0 ) {
              //Found already reported
              found = true;
              }
            }
          countholdings++;
          }

Now we go through each reported row from all of our filings. These rows are in chronological order from newest to oldest. This means that we can take the first row and we know that it is a current holding. We then take each row after that and compare it to each current holding. If the issuer CIK is the same we check the security name, and if that is the same we know it is an older transaction for a current holding.


        //If not reported yet
        if (found == false) {
          //Set holding 
          s16holdings[numholdings][COL_TORH] = type;
          s16holdings[numholdings][COL_AORD] = aord;
          s16holdings[numholdings][COL_TDATE] = date;  
          s16holdings[numholdings][COL_TTYPE] = ttype;
          s16holdings[numholdings][COL_DORI] = dori;  
          s16holdings[numholdings][COL_NUMTRANS] = numtrans;
          s16holdings[numholdings][COL_NUMOWNED] = numowned;
          s16holdings[numholdings][COL_CIK] = cik;
          s16holdings[numholdings][COL_NAME] = secname;
          numholdings++;
          }
        count++;
        }
      }

If the current row makes it through all of the holdings and is not matched, it is a current holding and it gets added to the list. We then continue onward and check all of the rows in our filings the same way. This leaves us with the list of newest unique holdings that gets printed out by our main function.


And here we are. At this point all of our main functionality is complete: we take a CIK and turn it into a list of holdings. The next step from here will be to add everything into an interface that an end user can use, rather than forcing us to put a CIK into a script and run it manually. That will be our goal for the next part in this series.


 


Joshua Kwiatkowski is a developer at Novaworks, primarily working on Novaworks’ cloud-based solution, GoFiler Online. He is a graduate of the Rochester Institute of Technology with a Bachelor of Science degree in Game Design and Development. He has been with the company since 2013.

Additional Resources

Novaworks’ Legato Resources

Legato Script Developers LinkedIn Group

Primer: An Introduction to Legato 

Posted by
Joshua Kwiatkowski
in Development at 17:25
Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)
No comments
Add Comment
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

 
   
 

Quicksearch

Categories

  • XML Accounting
  • XML AICPA News
  • XML FASB News
  • XML GASB News
  • XML IASB News
  • XML Development
  • XML Events
  • XML FERC
  • XML eForms News
  • XML FERC Filing Help
  • XML Filing Technology
  • XML Information Technology
  • XML Investor Education
  • XML MSRB
  • XML EMMA News
  • XML FDTA
  • XML MSRB Filing Help
  • XML Novaworks News
  • XML GoFiler Online Updates
  • XML GoFiler Updates
  • XML XBRLworks Updates
  • XML SEC
  • XML Corporation Finance
  • XML DERA
  • XML EDGAR News
  • XML Investment Management
  • XML SEC Filing Help
  • XML XBRL
  • XML Data Quality Committee
  • XML GRIP Taxonomy
  • XML IFRS Taxonomy
  • XML US GAAP Taxonomy

Calendar

Back May '25 Forward
Mo Tu We Th Fr Sa Su
Sunday, May 18. 2025
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Feeds

  • XML
Sign Up Now
Get SEC news articles and blog posts delivered monthly to your inbox!
Based on the s9y Bulletproof template framework

Compliance

  • FERC
  • EDGAR
  • EMMA

Software

  • GoFiler Suite
  • SEC Exhibit Explorer
  • SEC Extractor
  • XBRLworks
  • Legato Scripting

Company

  • About Novaworks
  • News
  • Site Map
  • Support

Follow Us:

  • LinkedIn
  • YouTube
  • RSS
  • Newsletter
  • © 2024 Novaworks, LLC
  • Privacy
  • Terms of Use
  • Trademarks and Patents
  • Contact Us