We didn’t have an API, so we built a bot

Abdulkadir A
9 min readJan 3, 2021

Now, the process you are to be read about is quite specific to the project described, so, read this post with the mindset that not everything I’ve described will work for you (immediately), but if at the end of this post, you develop a mental framework about solving similar challenges, then I consider the time documenting this well spent.

I had a client running an application built with a web app generator for managing their records you know, CRUD. The summarized process of generating web applications follows. You connect to your database by specifying connection parameters (quite a few DBMS are supported from MySQL, Postgres to SQL Server). The generator introspects your schema and performs a best-effort strategy to scaffold List, Read, Create, Update, and Delete page templates from your tables. Within the project editor, you can change field types for columns. For example, to change phone number fields to accept digits only even though the backing field in the table is a character type, or changing fields show upload file elements, or even designing cascading drop-down fields, Country => State type interactions. In the end, you specify an output directory, and the application generates a complete web application with everything from authentication, to navigation functionality into the directory specified. These files can then be uploaded to your web servers hosting directory, and voila!, you got yourself, an honest-to-goodness CRUD application. At this point, I should point out, my client had the PHP web application output generator, but that’s kind besides the main point for 2 reasons. A decent CRUD web application can be achieved in pretty much any language, and the product the client used, had generators for other languages.

The CRUD application generator exposed these extension points called events, that allowed one to write custom code. Events are triggered in response to both client-side, or server-side actions. A quick description of some of the events, and I’m paraphrasing a few here.

OnCustomHTMLHeader
This event occurs when the head section of the page is generating. Use it to provide additional information for the HEAD section of the page (such as keywords, author, or description).”

OnCustomRenderColumn
“This event occurs before column rendering and allows you to completely replace the cell content. It is an extremely useful event for conditional rendering or embedding third-party components to extend standard functionality.”

OnGetFieldValue
“This event occurs on displaying a value of a field. It allows you to replace the actual value of a field to your own.”

OnBeforeInsertRecord
“This event occurs when the Insert command is executed, and before the actual insertion.”

The problem
My client spends months collecting data from their customers yearly, then cleaning, and entering the data into their database using the generated CRUD app. At some point during the year, they are required to, in essence, re-enter most of the 25–30 field data already collected, cleaned, validated, and stored in their databases, into multiple (3 as at last count) different 3rd party websites for various reasons, that are beside the point for this blog post, but they are mainly applications for those customers for various services. This data transfer was an entry, submission, validate errors, track reference numbers, a process performed per record, to be completed within 2–4 weeks. It wasn’t cost-effective to keep people around only for this period, so automation to the rescue.

Each 3rd party website was different, required different data fields, and had different processes, and most importantly, exposed absolutely no APIs for this data entry/transfer.

The solution
The solution is outlined in the following parts;

  • Design an interop application to inject code into each 3rd party website
  • Design a 2-part controller logic to send data from the CRUD web app, to our code in the 3rd party website, perform the form logic inside it, and send responses to our interop app, or the CRUD web app
  • Use the web app generator events to embed custom code to connect to an interop app

Interop:
This piece I think, is the most critical part of the whole implementation. The interop app is a Windows desktop native application that was written in C# that hosts a web proxy server, that intercepts all web traffic destined from each 3rd party website to our browser, and then injects our Javascript controller code. In the initial iteration of the project, we did a little too much here. We realized it could be simpler (and describe a bit more at the end of this post).

Controller:
A set of functionality in javascript (and HTML w/ CSS to zazz it up if necessary) that’s injected into the 3rd party website by the Interop proxy server. The code from the name, is meant to establish a connection to a quasi command and control server, to receive instructions on what actions to perform within the 3rd party web page. For example, when a record is ready for submission to the 3rd party website from the CRUD web app, a button on the record triggers an action to begin submitting the record to the appropriate 3rd party website. That handler in the CRUD web application serializes the record fields over the WebSocket connection established on page load, to the controller in the 3rd party web page. If the controller action queue was empty, it triggers a web page navigation to the appropriate entry point for the form on the 3rd party website. Next, the Controller checks if any records are in its queue, well, yeah, that’s why the entry point was loaded, then, it deserializes the record, and begins form entry. Maybe you’ve realized this, but this is fundamentally the only part of our infrastructure that knows about the mechanics of each 3rd party website, from the entry form, validation errors, to knowing what a successful submission should be.

Because the controller logic knows the page in and out, it knows performs the individual field entries. If the 3rd party website form required a First Name entry, the controller retrieves the value to go in the field from the deserialized record that arrived over the WebSocket connection (remember?), then does a DOM lookup for the Surname field, and sets the value, you know $("[name='Surname']").val(record["surname"]) if you use jQuery, or document.getElementsByName("surname")[0].value = record["surname"] if direct DOM access is your style. The same thing goes for the address or occupation fields. For fields that required an API to set, for example, a select2 field, well, we looked up the select2 field documentation for the API to set a value, and in this case was $("select#gender").select2("val", 2). For say Telerik fields, again, look up their documentation, and find the setter functionality.

When a form is completed, the controller finds the submit button, and “clicks” it, twiddles its thumbs in wait and continues the process for any number of steps required to complete a form. One of the 3rd party websites was a wizard with 8 steps :O. IKR! When a record is completely submitted, how do we know? We know the mechanics of the form submission in the controller code, so we can say, look at the current URL knowing we just submitted a form to see if we are at a complete / success URL, or whatever the mechanics may be for your case. Back to if our form was submitted, we sent a reference number to the CRUD generator so it would mark the record as having been successfully submitted or any number of downstream actions to be performed.

The pairing of the controller and the interop app can be a powerful combination. Let me explain a scenario. On one of the 3rd party websites, the 8 stepped one, one of the steps requires us to upload a photograph of the customer. In this case, when the controller recognizes we’ve arrived at the photograph upload page (our simple case href.indexOf(‘Manage_UploadPicture’) > -1, we find the file upload input in the DOM, and trigger a click of the element, wait for a set time, then make a call over the Websocket to the interop app with an action we’d like the interop app to perform. In this case, the interop app runs an AutoHotkey script to type out an argument to it, which is a path to an image it had downloaded to a local directory when the request was sent to submit the form. The AutoHotkey script is quite simple.

We can trigger the form submission after the interop app tells the Controller over the WebSocket connection it is done entering the upload path.

Event code in our CRUD web app:
Using our generator events we were able to render buttons that trigger record submission to the interop app if the record requires submission. This may have been the simplest part of the whole system because it mostly relayed actions to the controller app after it received a connection.

Deployment & Usage
This solution was initially deployed after the first 3rd party website came on-line, and the regulatory requirement to enter the customers into them established. Soon after I teased a demo of the solution, the client was in love with it, and wanted it complete or at least usable as soon as possible, because they could see the immense potentials the solution offered. The solution reduced the number of staff they had to train, have on hand for the 2–4 week window the 3rd party websites were open for. The solution also allowed them to complete their applications to the 3rd party websites within any paced, time frame they scheduled, and that could be from hours to a few days, instead of weeks with many users working overtime, every day for the duration. Best of it all, the submissions to the 3rd party websites could all be performed by 1 or 2 users. It was actually performed at some point by just 1 user. The data submitted was virtually free of errors because the bot was only copying the already cleaned data from the client database to the 3rd party services.

When 2 other 3rd party services came online, the client knew they had to have the same automation bot working for those sites as soon as possible, again due to the immense advantages from above. Because the Controller logic was what was aware of the mechanics of each 3rd party website submission, that was where our changes were focused.

The Interop application’s existence preceded the automation bot. It was originally designed for communicating with hardware peripherals connected to the client user’s PC, think various scanners. The Interop application was designed to be modular, and every major functionality was designed as a load-able plugin. Think ScannerAPlugin, ScannerBPlugin, etc. Customer data would be scanned, processed by the appropriate plugin, and sent to the CRUD web app UI over a WebSocket connection. Every time the CRUD web app was loaded in the client browser, it would establish a WebSocket connection to the Interop app running locally.

When the 3rd party website requirements went live, most of what was required was the creation of a new set of Plugins for each 3rd party website.

Conclusion

With the parts described above, I’ve only outlined the major pieces of functionality required to make this work. But, this could be extended a whole, whole lot more. In our next planned iteration after 4 years of running the system described above, we’re planning on embedding UI elements in the 3rd party web pages so a user can track the progress of an individual submission, view the submission queue, log notifications, etc.

Because we can render our code into the 3rd party website without asking them to change their code for us, we can do pretty much anything we wanted on their webpage.

Admittedly, there are many other ways this solution can be architected. We could have gone with moving the native proxy server to its own physical / logical server and having the communication between the client PC and Interop app be remote. Of course, like all choices we make in this line of work, that comes with its own set of challenges.

I must admit this was quite a fun project to work on. Bringing a unique set of challenges that I was able to surmount relatively quickly by breaking down the problem, and identifying the required bits, and keeping each module as simple as possible. Because the desktop application had to be deployed to each user’s PC, if most of the application went in there, managing updates, and keeping iteration times as low as possible would have come with more challenges. In my case, the controller code was a CoffeeScript + React application, built and deployed side by side with the CRUD web application, and then the Interop app simply embeds the paths to the controller code in the 3rd party web page.

Version next, will use Webpack’s entrypoints.json to lookup the entrypoint for each 3rd party website, and have that become my scriptPathsin the code snippet above, that way adding more controller functionality would not require a change to the Interop clients already deployed.

Sample:

This was a long one, thank you for stopping by.

In case you’re wondering what the complete stack entailed, it was C#, Titanium Web Proxy, SignalR, NancyFX, PHP, MySQL, Doctrine, Silex, Tesseract, EmguCV, AutoHotkey, React, Webpack, CoffeeScript, and Typescript all of which I’m comfortable with, and more.

Plug:

I think in, and can help you reason about building application architectures like this, and more, so if you are interested in hiring an experienced developer in .NET (C#), Typescript, React, Postgres, kindly send me an email at abdulkadirna [dot] gmail [dot] com, or via Twitter @abdulkadirna.

Link

--

--