• Solutions
    • FERC XBRL Reporting
    • FDTA Financial Reporting
    • SEC Compliance
    • Windows Clipboard Management
    • Legato Scripting
  • Products
    • GoFiler Suite
    • XBRLworks
    • SEC Exhibit Explorer
    • SEC Extractor
    • Clipboard Scout
    • Legato
  • Education
    • Training
    • SEC and EDGAR Compliance
    • Legato Developers
  • Blog
  • Support
  • Skip to blog entries
  • Skip to archive page
  • Skip to right sidebar

Friday, February 28. 2020

LDC #167: A Quick Web Link Builder Part II

Last month I described a tool I built to create and insert web news links in my blog, “A Quick Web Link Builder”. I recently expanded the functionality to load the web page and extract information to load link information into the dialog, thus making linking an even quicker process.


Introduction


The problem to adding news links is that one not only has to add the link but also the description and the citation. If you have not read the first article on inserting the links, I recommend taking a look at LDC #165: A Quick Web Link Builder.


In that article, the resource has lines commented (here denoted in blue text):


#beginresource

#define ASL_URL                                 201
#define ASL_URL_LOOKUP                          202
#define ASL_TEXT                                203
#define ASL_CITE                                204


LinkQuery01Dlg DIALOGEX 0, 0, 280, 118
STYLE DS_3DLOOK | WS_POPUP | WS_VISIBLE | WS_CAPTION
CAPTION "Add Quick Web Link"
FONT 8, "MS Shell Dlg"
{
 CONTROL "Link To:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 4, 30, 8, 0
 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 36, 9, 236, 1, 0
 CONTROL "&URL:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 18, 30, 8
 CONTROL "", ASL_URL, "edit", ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 16, 170, 12
 CONTROL "Look Up", ASL_URL_LOOKUP, "button", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 225, 16, 40, 12, 0
 CONTROL "Parameters:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 37, 40, 8, 0
 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 46, 42, 226, 1, 0
 CONTROL "&Text:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 53, 30, 8, 0
 CONTROL "", ASL_TEXT, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 51, 220, 12, 0
 CONTROL "&Cite:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 69, 30, 8, 0
 CONTROL "", ASL_CITE, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 67, 220, 12, 0
 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 6, 90, 268, 1, 0
 CONTROL "OK", IDOK, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 168, 96, 50, 14
 CONTROL "Cancel", IDCANCEL, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 223, 96, 50, 14
}

#endresource

By uncommenting the lookup button and adding the code described below, we get the following functionality:


Code sections in blue are now necessary


The “Look Up” button loads the specified URL and then captures meta data from the source file to load the Text and Cite controls for the URL.


In this blog, we will be adding a couple of global variables, a host name lookup, a button action function, and the code to retrieve the meta data.


The Button Control


After removing the comment from the resource, we need to add in code to capture the button press:


void lq_action(int c_id, int c_ac) {
    int                 rc;

    if (c_id == ASL_URL_LOOKUP) {
      url = EditGetText(ASL_URL);
      if (url != "") {
        rc = get_url_meta_data();
        if (IsNotError(rc)) {
          EditSetText(ASL_TEXT, title);
          EditSetText(ASL_CITE, cite);
          }
        }
      }
    }

The dialog procedure lq_action has two parameters: the Control ID and the Control Action. The function is named lq_action with the prefix “lq_" specified in the DialogBox function and the procedure “action" as predefined by the Legato dialog processor. The control ID is specified by the resource. For buttons, the action code is not relevant.


Our snippet of code simply checks the ID, gets the text of the URL, and, if present, passes it to the get_url_meta_data function. Assuming that function succeeds, the title and cite global variables are loaded to the dialog controls.


We could add all the lookup codes required in the action routine, but that is not advisable. To optimize operation, any complex action should be placed in subroutines. The action routine is called constantly for each control to process key presses, focus changes, selections, and other messages. So having any extra code declared in the local variable pool wastes processor time. It also improves debugging, as we shall see later.


Getting the Meta Data


The get_url_meta_data function consists of five sections: (i) get the target page data; (ii) get the title; (iii) get the cited host/domain; (iv) post process the cite for certain domains; and (v) complete the prep of global variables title and cite. Here is the code:


int get_url_meta_data() {

    string              page, s1, s2;
    int                 rc, ix;

    title = ""; cite = "";

    page = HTTPGetString(url);
    if (page == "") { return GetLastError(); }
        
    ix = FindInString(page, "<title>");
    if (ix > 0) {
      title = GetStringSegment(page, ix + 7, 200);
      ix = InString(title, "<");
      if (ix > 0) { title[ix] = 0; }
      title = TrimPadding(title);
      }
        
    s1 = GetURIHost(url);
        
    if (s1 != "") {
      ix = FindInTable(hostlist, s1);
      if (ix < 0) {
        cite = s1;
        }
      else {
        cite = hostlist[ix][1];
        }
      }
            
    if (cite == "[1]") {
      ix = InString(title, " - YouTube");
      if (ix > 0) { title[ix] = 0; }

      s2 = ",\\\"author\\\":\\\"";
      ix = InString(page, s2);
      if (ix > 0) { 
        ix += GetStringLength(s2);
        cite = GetStringSegment(page, ix, 40);
        ix = InString(cite, "\\\"");
        if (ix > 0) { cite[ix] = 0; }
        }
      }

    title = EntitiesToUTF(title);
    title = UTFToAnsi(title);

    cite = "(" + cite + " " + GetLocalTime(DS_MMDDYYYY) + ")";

    return ERROR_NONE;
    }   

The first action in the routine is to clear the global cite (for a citation) and title variables and load the URL into a string. If the load fails, an error code is returned (note that in this quick script, so the action routine ignores the error and pressing the button does nothing). Assuming we have text, we get the title of the HTML document (which is located in the header between the <TITLE></TITLE> tags). This assumes that the author of the page employs best practices and actually used the TITLE element.


That takes care of returning one parameter: the title or text of the link. (The next version of Legato, 1.2i, improves the function HTMLHeaderGetTitle to allow the web source to be supplied as a string.)


The citation can be the host (domain name) or a translated value. The GetURIHost function pulls the domain from the URL. Assuming it was successful, we can use the FindInTable function to look for a translation in a table loaded at the start of the script:


int run(int f_id, string mode) {

    handle              hEO;
    string              code;
    string              s1;
    int                 c_x, c_y,
                        rc;

    if (mode != "preprocess") { return ERROR_NONE; }

    if (ArrayGetAxisDepth(hostlist) == 0) {
      hostlist = CSVReadTable(GetScriptFolder() + "Host List.csv");
      }

    . . .

If the hostlist variable is empty, the CSVReadTable function is used to load the list. The list is very simple:


Example hostlist


Or,


"www.youtube.com","[1]"
"www.africanews.com","Africanews"
"www.arrl.org","ARRL"
"blog.scoutingmagazine.org","Bryan On Scouting"
"hackaday.com","Hackaday"
"www.iaru-r2.org","IARU"
"www.radioworld.com","Radio World"
"www.southgatearc.org","SouthgateARC"
"www.wbrc.com","WBRC"

If you look closely, you will see the notation “[1]” on YouTube. This is used later in the citation. After looking in the first column for a host name, the translated value is set into the cite variable. If a translation cannot be located, then the raw domain name is used for the citation.


As mention, the notation [1] is used for YouTube. You can add your own exception, but this is mine for YouTube:


    if (cite == "[1]") {
      ix = InString(title, " - YouTube");
      if (ix > 0) { title[ix] = 0; }

      s2 = ",\\\"author\\\":\\\"";
      ix = InString(page, s2);
      if (ix > 0) { 
        ix += GetStringLength(s2);
        cite = GetStringSegment(page, ix, 40);
        ix = InString(cite, "\\\"");
        if (ix > 0) { cite[ix] = 0; }
        }
      }

In this section, I am relying on the YouTube formatted page. So the vendor could change the format at any time a possibly break my code. For now, this is a decent approach.


First, we want to remove some text from the title. We search for the text “ - YouTube” and the string truncated. Other sites may also be formatted this way, but it is prominent with YouTube. For the citation, I really want the author (or channel name) as it appears on the page, not “YouTube”. To find the channel, I am performing a sloppy string search within the obfuscated JavaScript to find the author. I tested about ten YouTube pages — the formatting seems consistent and has been working fine with this tool. The code gets the size of match string and then captures the web page from the match position plus the length of the match string for about 40 characters (I am assuming most channel names will fit in that area). Then I am looking for the end of the string literal in the object and truncating the string. There are some obvious faults to this logic when the string starts to exceed 40 characters, but, again, we can cross that bridge in a later version. At the end of the process, we have the citation name.


Finally, we need to condition the title and create the complete citation with the date. My format places the citation in parentheses. The preparation of the title string is a little ugly since Legato does not have a single routine at this point to take entities to ANSI text.


Testing the Routine


Debugging code inside of dialog procedures is a bit cumbersome because the IDE does not allow code stepping in the dialog context. When I write a routine like get_url_meta_data, I usually write a test jig, for example (the blue text is the code for the test jig):


    string              hostlist[][2];                          // Site Look up
    string              url, title, cite;                       // Working Dialog Info

    int                 get_url_meta_data                       ();

int main() {
    int                 rc;

    hostlist = CSVReadTable(GetScriptFolder() + "Host List.csv");

    url = "https://www.youtube.com/watch?v=wOikIWz4wgc&t=436s";

    rc = get_url_meta_data();

    AddMessage("Error   : 0x%08X", rc);
    AddMessage("URL     : %s", url);
    AddMessage("Title   : %s", title);
    AddMessage("Cite    : %s", cite);

    return 0;
    }


int get_url_meta_data() {

    string              page, s1, s2;
    int                 rc, ix;

    title = ""; cite = "";

    page = HTTPGetString(url);
    if (page == "") { return GetLastError(); }

    ix = FindInString(page, "<title>");
    if (ix > 0) {
      title = GetStringSegment(page, ix + 7, 200);
      ix = InString(title, "<");
      if (ix > 0) { title[ix] = 0; }
      title = TrimPadding(title);
      }
        
    s1 = GetURIHost(url);
        
    if (s1 != "") {
      ix = FindInTable(hostlist, s1);
      if (ix < 0) {
        cite = s1;
        }
      else {
        cite = hostlist[ix][1];
        }
      }
            
    if (cite == "[1]") {
      ix = InString(title, " - YouTube");
      if (ix > 0) { title[ix] = 0; }

      s2 = ",\\\"author\\\":\\\"";
      ix = InString(page, s2);
      if (ix > 0) { 
        ix += GetStringLength(s2);
        cite = GetStringSegment(page, ix, 40);
        ix = InString(cite, "\\\"");
        if (ix > 0) { cite[ix] = 0; }
        }
      }

    title = EntitiesToUTF(title);
    title = UTFToAnsi(title);

    cite = "(" + cite + " " + GetLocalTime(DS_MMDDYYYY) + ")";

    return ERROR_NONE;
    }   

This allows for setting different URL values, testing, and stepping through as needed. When run, the above jig outputs:


Results of test jig 


The Complete Revised Script


Here is the completed revised script:


                                                                /************************************************/

    string              a_class;
    string              a_text;
    string              a_url;
    string              a_cite;

    string              hostlist[][2];                          // Site Look up
    string              url, title, cite;                       // Working Dialog Info
    


                                                                /************************************************/
int setup() {

    string              fnScript;
    string              item[10];
    int                 rc;

    item["Code"] = "EXTENSION_QUICK_WEB_LINK";
    item["MenuText"] = "&Quick Web Link";
    item["Description"] = "<B>Quick Web Link</B>\r\rAdds a custom external web hypertext link.";
    item["Class"] = "DocumentExtension";

    rc = MenuFindFunctionID(item["Code"]);
    if (IsNotError(rc)) { return ERROR_NONE; }

    rc = MenuAddFunction(item);
    if (IsError(rc)) { return ERROR_NONE; }
    fnScript = GetScriptFilename();
    MenuSetHook(item["Code"], fnScript, "run");

    ArrayClear(item);
    item["KeyCode"] = "L_KEY_CONTROL";
    item["ChordAKey"] = "Q";
    item["FunctionCode"] = "EXTENSION_QUICK_WEB_LINK";
    item["Description"] = "Insert quick keb link";

    QuickKeyRegister("Page View", item);

    return ERROR_NONE;
    }

                                                                /************************************************/

int main() {

    string              s1;
    int                 rc;

    s1 = GetScriptParent();
    if (s1 == "LegatoIDE") {
      rc = MenuFindFunctionID("EXTENSION_QUICK_WEB_LINK");
      if (IsError(rc)) {
        setup();
        }
      else {
        MenuDeleteHook("EXTENSION_QUICK_WEB_LINK");
        s1 = GetScriptFilename();
        MenuSetHook("EXTENSION_QUICK_WEB_LINK", s1, "run");
        }
      MessageBox('i', "Hook running on IDE");
      }
    return ERROR_NONE;
    }

                                                                /************************************************/

int run(int f_id, string mode) {

    handle              hEO;
    string              code;
    string              s1;
    int                 c_x, c_y,
                        rc;

    if (mode != "preprocess") { return ERROR_NONE; }

    if (ArrayGetAxisDepth(hostlist) == 0) {
      hostlist = CSVReadTable(GetScriptFolder() + "Host List.csv");
      }


    hEO = GetActiveEditObject();
    if (hEO == NULL_HANDLE) { return ERROR_CONTEXT; }

    rc = GetSelectMode(hEO);
    if (rc != EDO_NOT_SELECTED) {
      MessageBox('x', "Deselect to use this function");
      return ERROR_CANCEL;
      }
      
    c_y = GetCaretYPosition(hEO); c_x = GetCaretXPosition(hEO);

    rc = DialogBox("LinkQuery01Dlg", "lq_");
    if (IsError(rc)) { return rc; }

    a_class = "page";

    code = "<a " +
           "class=\"" + a_class + "\" " +
           "target=\"_blank\" "  +
           "href=\"" + a_url + "\"" +
           ">";
    code += ANSITextToXML(a_text);
    code += "</a>";
    code += " <font style=\"font-size: 8pt\">";
    code += ANSITextToXML(a_cite);
    code += "</font>";
    
    WriteSegment(hEO, code, c_x, c_y);

    return ERROR_NONE;
    }



                                                                /************************************************/

#beginresource

#define ASL_URL                                 201
#define ASL_URL_LOOKUP                          202
#define ASL_TEXT                                203
#define ASL_CITE                                204


LinkQuery01Dlg DIALOGEX 0, 0, 280, 118
STYLE DS_3DLOOK | WS_POPUP | WS_VISIBLE | WS_CAPTION
CAPTION "Add Quick Web Link"
FONT 8, "MS Shell Dlg"
{
 CONTROL "Link To:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 4, 30, 8, 0
 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 36, 9, 236, 1, 0
 CONTROL "&URL:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 18, 30, 8
 CONTROL "", ASL_URL, "edit", ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 16, 170, 12
 CONTROL "Look Up", ASL_URL_LOOKUP, "button", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 225, 16, 40, 12, 0
 CONTROL "Parameters:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 37, 40, 8, 0
 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 46, 42, 226, 1, 0
 CONTROL "&Text:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 53, 30, 8, 0
 CONTROL "", ASL_TEXT, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 51, 220, 12, 0
 CONTROL "&Cite:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 69, 30, 8, 0
 CONTROL "", ASL_CITE, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 67, 220, 12, 0
 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 6, 90, 268, 1, 0
 CONTROL "OK", IDOK, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 168, 96, 50, 14
 CONTROL "Cancel", IDCANCEL, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 223, 96, 50, 14
}

#endresource

int get_url_meta_data() {

    string              page, s1, s2;
    int                 rc, ix;

    title = ""; cite = "";

    page = HTTPGetString(url);
    if (page == "") { return GetLastError(); }

    ix = FindInString(page, "<title>");
    if (ix > 0) {
      title = GetStringSegment(page, ix + 7, 200);
      ix = InString(title, "<");
      if (ix > 0) { title[ix] = 0; }
      title = TrimPadding(title);
      }
        
    s1 = GetURIHost(url);
        
    if (s1 != "") {
      ix = FindInTable(hostlist, s1);
      if (ix < 0) {
        cite = s1;
        }
      else {
        cite = hostlist[ix][1];
        }
      }
            
    if (cite == "[1]") {
      ix = InString(title, " - YouTube");
      if (ix > 0) { title[ix] = 0; }

      s2 = ",\\\"author\\\":\\\"";
      ix = InString(page, s2);
      if (ix > 0) { 
        ix += GetStringLength(s2);
        cite = GetStringSegment(page, ix, 40);
        ix = InString(cite, "\\\"");
        if (ix > 0) { cite[ix] = 0; }
        }
      }

    title = EntitiesToUTF(title);
    title = UTFToAnsi(title);

    cite = "(" + cite + " " + GetLocalTime(DS_MMDDYYYY) + ")";

    return ERROR_NONE;
    }   

void lq_action(int c_id, int c_ac) {
    int                 rc;

    if (c_id == ASL_URL_LOOKUP) {
      url = EditGetText(ASL_URL);
      if (url != "") {
        rc = get_url_meta_data();
        if (IsNotError(rc)) {
          EditSetText(ASL_TEXT, title);
          EditSetText(ASL_CITE, cite);
          }
        }
      }
    }


int lq_validate() {
    string              parts[];
    string              s1;
    int                 rc;

    a_url = EditGetText(ASL_URL, "URL", EGT_FLAG_REQUIRED);
    rc = GetLastError();
    if (IsError(rc)) { return rc; }
    parts = GetURIComponents(a_url);
    s1 = MakeLowerCase(parts["scheme"]);
    if ((s1 != "http:") && (s1 != "https:")) {
      MessageBox('x', "Need an HTTP or HTTPS scheme on link.");
      return ERROR_SOFT | ASL_URL;
      }

    a_text = EditGetText(ASL_TEXT, "Text", EGT_FLAG_REQUIRED);
    rc = GetLastError();
    if (IsError(rc)) { return rc; }

    a_cite = EditGetText(ASL_CITE, "Cite", EGT_FLAG_REQUIRED);
    rc = GetLastError();
    if (IsError(rc)) { return rc; }
    return ERROR_NONE;
    }

Conclusion


I have been using this since the posting of the last blog and find it works quite well. This demonstrates how a specialize tool can be added to the application to perform certain tasks that would otherwise have many steps or be very time-consuming.


 


Scott TheisScott Theis is the President of Novaworks and the principal developer of the Legato scripting language. He has extensive expertise with EDGAR, HTML, XBRL, and other programming languages.

Additional Resources

Novaworks’ Legato Resources

Legato Script Developers LinkedIn Group

Primer: An Introduction to Legato 

Posted by
The Novaworks Team
in Development at 17:12
Trackbacks
Trackback specific URI for this entry

No Trackbacks

Comments
Display comments as (Linear | Threaded)
No comments
The author does not allow comments to this entry

Quicksearch

Categories

  • XML Accounting
  • XML AICPA News
  • XML FASB News
  • XML GASB News
  • XML IASB News
  • XML Development
  • XML Events
  • XML FERC
  • XML eForms News
  • XML FERC Filing Help
  • XML Filing Technology
  • XML Information Technology
  • XML Investor Education
  • XML MSRB
  • XML EMMA News
  • XML FDTA
  • XML MSRB Filing Help
  • XML Novaworks News
  • XML GoFiler Online Updates
  • XML GoFiler Updates
  • XML XBRLworks Updates
  • XML SEC
  • XML Corporation Finance
  • XML DERA
  • XML EDGAR News
  • XML Investment Management
  • XML SEC Filing Help
  • XML XBRL
  • XML Data Quality Committee
  • XML GRIP Taxonomy
  • XML IFRS Taxonomy
  • XML US GAAP Taxonomy

Calendar

Back May '25 Forward
Mo Tu We Th Fr Sa Su
Sunday, May 18. 2025
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Feeds

  • XML
Sign Up Now
Get SEC news articles and blog posts delivered monthly to your inbox!
Based on the s9y Bulletproof template framework

Compliance

  • FERC
  • EDGAR
  • EMMA

Software

  • GoFiler Suite
  • SEC Exhibit Explorer
  • SEC Extractor
  • XBRLworks
  • Legato Scripting

Company

  • About Novaworks
  • News
  • Site Map
  • Support

Follow Us:

  • LinkedIn
  • YouTube
  • RSS
  • Newsletter
  • © 2024 Novaworks, LLC
  • Privacy
  • Terms of Use
  • Trademarks and Patents
  • Contact Us