Building secured websites is a taxing expenditure for many companies especially when it comes to submitting private information. Companies require hiring web developers that have specific training in protecting against XSS (Cross-site Scripting). Often industries dealing with health and financial details will spend millions on single web pages checking for numerous vulnerabilities. The execution of code and XSS account for an overwhelming amount of modern attacks on the web. There are detection software suites and many third-party companies that specialize in securing and auditing websites. These external services often vastly exceed the cost of development, but are a necessary expenditure for banks, e-commerce portals and HIPPA compliant websites.
While modern day frameworks such as React, Angular, and Vue do require javascript in order to manipulate the DOM. They all have a lifecycle that results in the HTML being modified in real time with the users interactions. While server-side rendering can hide a majority of javascript rendering and data injection code, it cannot protect against code that is interacted with by the user.
The biggest vulnerability in these examples is code injection that make HTTP requests or hide certain UI/functionality from unauthenticated users. Many of these vulnerabilities can be taken advantage of by injecting a Greasemonkey script or simply inspecting the element and injecting a <script>
tag at the end of the <body>
. Other types of attacks such as manipulating the memory, cookies and HTML elements on the page can allow hackers to gain access or maliciously attack websites.
In an ideal world, if there was a way to hide javascript execution and code from the client's machine and simply just display the manipulated DOM, then XSS and client-side scripting would be nearly impossible.
Users would only be able to take advantage of manipulating the JS and element, if they had access to it. What I'm proposing is creating a stream of HTML to the client-side machine and then streaming back the users interactions to a remote server. The remote server would mimic the actions taken by the user and ultimately replicate them in a safe environment. This would stop injection of JS code and also would hide any javascript execution from the client side. In situations like the image above where interaction and javascript code change the HTML, the new HTML would be streamed back to the client-side for rendering, leaving out the execution phase.
The procedure for doing this is quite simple. It requires the use of a server-side machine running a headless browser. This headless browser can be controlled via popular frameworks such as Puppeteer or CasperJS. A runtime of web socket can be used to connect the APIs of the headless browser with the clients interactions. Simply mapping the DOM Events to socket events, will allow the user to interact on their machine and replicate the actions server-side in the headless browser. The headless browser will emit new HTML as the user continues to use the website. These will be sent back to the user in real time as the HTML changes. A filter can be applied at this step, removing script tags and sensitive information from the response. The only code that will need to be run on the client side is a safe client-socket package that will be capable of streaming the DOM events and receiving the HTML.
The biggest challenge seen with this implementation is the overhead of rendering a website server-side and streaming, which would result in massive amounts of bandwidth and resources in order to render many different users. A single instance of headless chrome can use up to 120mb of memory depending on the content of the website. Due to this, a real implementation of this idea in a production environment would be only used for hyper critical parts of a web page.
Moving on, the issue of latency is a problem. Websites that may have lots of DOM manipulation from the JS would experience a delay between actions on the client side and the actual HTML that is eventually set back. While modern socket implementations are capable of sub millisecond response times, this still results in a noticeably worse user experience.
- Protecting sensitive form submissions that are at risk of script or HTML injection (e.g. banking, government websites, payment portals).
- Recording user sessions for marketing analysis and bug replication instructions.
- Removing sensitive meta data from HTML code (can also be solved via a regular implementation of server-side rendering without the need to stream).
- Allow heavy websites to run on lower end machines, due to the lack of significant JS execution.
I'm currently working as a Fullstack Developer for healthcare technology company Prizm Media and a student at the University of British Columbia where I am obtaning my degree in a Business & Computer Science (BUCS). I am also an open source developer, having made several libraries, composite-data, react-border-wrapper, and save-to-activity, which have found their way into multi-million dollar products. I am a connoisseur of writing security based software as I have been involved in creating a handful of HIPPA compliant applications, including the mail-order prescriptions service RxtoMe and lead generation website Festive Health. This article is written as a proposal for my enrolment in CPSC 448 for the September 2019 session at UBC.
Fullstack Software Developer | Student | Open Source Developer
GitHub
LinkedIn
[email protected]