Despite all of our investments in security tools, the codebase can be the weakest link for any organization’s cybersecurity. Sanitizing and validating inputs is usually the first layer of defense.
Attackers have been using classic flaws for years with a pretty high success rate. While advanced threat actors have more sophisticated approaches such as adversarial machine learning, advanced obfuscation, and zero-day exploits, classic attack techniques such as SQL injection, cross-site scripting (XSS), remote file inclusion (RFI) and directory traversal are still the most common attacks.
These techniques are often the first step on the way to privilege escalation and lateral movements. That’s why developers must sanitize and validate data correctly before processing transactions or saving any entry in a database.
Here we’ll focus on sanitizing and validating inputs, but other elements such as a server’s configurations must also be taken into account to properly secure forms.
See the Top Web Application Firewall (WAF) Solutions
Table of Contents
What is the Difference Between Sanitizing and Validating Input?
Validation checks whether an input — say on a web form — complies with specific policies and constraints (for example, single quotation marks). For example, consider the following input:
<input id="num" name="num" type="number" />
If there’s no validation, nothing prevents an attacker from exploiting the form by entering unexpected inputs instead of an expected number. He or she could also try to execute code directly if submitted forms are stored in a database, which is pretty common.
To prevent such a bad situation, developers must add a validation step where the data is inspected before proceeding. For example, using a popular language like PHP, you can check the data type, the length, and many other criteria.
Sanitizing consists of removing any unsafe characters from user inputs, and validating will check to see if the data is in the expected format and type. Sanitizing modifies the input to ensure it’s in a valid format for display, or before insertion in a database.
Why You Should Use Input Sanitization and Validation
The most common techniques used against weak inputs are probably cross-site scripting (XSS) attacks, which involves attackers injecting malicious scripts into otherwise trustworthy websites.
Some XSS attacks are more obvious than others, which means that even if you take the time to sanitize and validate your inputs, a skilled attacker might still find a way to inject malicious code under specific conditions.
A classic attack demo consists of injecting the following script in a weak input, where the placeholder ‘XSS’ is arbitrary JavaScript:
<script>alert('XSS')</script>
If the content of the input is displayed on the page (or elsewhere), the attacker can execute arbitrary JavaScript on the targeted website. The typical case is a vulnerable search input that displays the search term on the page:
https://mysite.com/?s=<script>alert('XSS')</script>
It gets worse if the malicious entry is stored in the database. The demo code might look fun to play with, but in real-world conditions attackers can do a lot of things with JavaScript, sometimes even steal cookies.
When Not to Use Sanitization
The biggest problem with sanitization is the false impression of security it might give. Stripping unwanted chars and HTML tags is only one layer of checking. It’s often poorly executed and removes too much information like legitimate quotes and special chars while it does not cover all angles of attack. You cannot apply generic rules blindly.
The context is the key, which includes the programming languages in use. More on this later, but it’s important to follow a principle called “escape late” (for example, just before output) because you know the exact context where the data is used.
In my experience, the trickiest situations are when you need to allow raw inputs and other permissive configurations. In such cases, it becomes very hard to sanitize data correctly, and you have to maintain a custom whitelist of allowed characters or manually blacklist some malicious patterns.
It’s recommended to use robust libraries and frameworks instead.
More generally, developers must not hesitate to return errors on bad inputs instead of resorting to guessing or fixing, which is prone to errors and flaws.
Best Practices: Sanitizing Inputs, Validation, Strict Mode
There are some principles and best practices that dev teams can follow for the best possible results. We’ll cover the broad categories, along with specifics to watch for.
Don’t Trust User Inputs
Some websites don’t bother checking user inputs, which exposes the application to the maximum level of danger. Fortunately, that’s getting rarer thanks to security awareness and code analysis. However, incomplete sanitization is not a great solution either.
Here are a few of the possible attack paths you need to think about.
GET requests
If developers don’t sanitize strings correctly, attackers can take advantage of XSS flaws such as:
https://mysite.com/?s=<script>console.log('you are in trouble!');</script>
Classic cybersecurity awareness usually highlights the above example with a simple console.log or even an alert. However, it shows that anyone can execute arbitrary JavaScript on your page by simply sending a shortened version of the malformed URL to unsuspecting victims.
Some XSS flaws can even be persistent (stored in the database, for example), which removes the hassle from attackers of making the victim click on something by automatically serving malicious payloads to the website’s users.
Cookies
Websites often use HTTP cookies for session management, customization, and tracking. For example, developers can log in users, remember their preferences, and analyze their behaviors.
The server generates a cookie, or an approximate piece of data, and sends it to the browser to save it for later uses. As a result, stealing cookies allows attackers to be able to impersonate the victims by providing them with immediate access to the targeted accounts without login.
Moreover, hackers don’t have to compromise the victim’s computer. Because HTTP cookies are sent along with each request, attackers can intercept those requests to steal data during man-in-the-middle (MITM) attacks, for example.
A more sophisticated approach can use an XSS attack to insert malicious code into the targeted website to ultimately copy users’ cookies and perform harmful actions in their name.
While Google plans to phase out cookies in its Chrome browser next year, it’s still important to develop best practices for cybersecurity. For example, as of 2022, SSL (Secure Sockets Layer) is no longer an optional layer. However, if the code sends non-SSL requests, cookies will be sent in plain text, so make sure you are using SSL everywhere.
Another good practice is to always use the httpOnly attribute to prevent hijacking with JavaScript. The SameSite attribute is also recommended for developers.
While cookies are convenient for both users and developers, modern authentication and APIs allow better approaches. As storing data in client-side databases allows for many safety and privacy vulnerabilities, it’s better to implement other more secure practices instead.
POST requests
POST requests are server-side requests, so they do not expose data in the URL, for example, when you upload an image on your online account or when you submit a contact form, such as:
<form action="https://my-website.com/contact" method="POST">
A common misconception is that POST requests are more secure than GET requests. However, at most, POST requests are security through obscurity. While it is better to use POST requests for user modifications, it’s not great for security-related purposes, and it won’t harden security magically.
One very simple way to sanitize POST data from inputs in PHP could be through the commands:
filter_var($_POST['message'], FILTER_SANITIZE_STRING); filter_var('bobby.fisher@chess.com', FILTER_VALIDATE_EMAIL)
Another good practice in PHP is to use htmlentities() to escape any unwanted HTML character in a string.
As with cookies, always use SSL to encrypt data, so only TCP/IP information will be left unencrypted.
Directory traversal
If the codebase includes an image tag such as
<img src="/getImages?filename=image12.png" />
then hackers may try using
https://yourwebsite.com/getImages?filename=../../../etc/passwd
to gain access to users’ information.
However, if your server is configured correctly, such attempts to disclose confidential information will be blocked. You should also consider filtering user inputs and ensuring that only the expected formats and data types are transmitted.
Also read: Top Code Debugging and Code Security Tools
Don’t Trust Client-Side Validation
A common misconception, especially for beginners, is to rely on HTML and JavaScript only to validate forms data. While HTML allows defining patterns and required fields, such as setting a character limit or requiring specific fields to be filled, there is no HTML attribute or JavaScript code that can’t be modified on the client side.
Hackers might also submit the form using cURL or any HTTP client, so the client side is absolutely not a secure layer to validate forms.
Enable Strict Mode
Whenever you can, enable strict mode, whether it’s PHP, JavaScript or SQL, or any other language. However, as strict mode prevents lots of convenient syntaxes, it might be difficult to enable if you have a significant technical debt and legacy.
On the other hand, if you don’t code in strict mode, the engine starts making guesses and can even modify values automatically to make the code work. This opens up vulnerabilities hackers can utilize to inject malicious commands.
For example, in 2015, Andrew Nacin, a major contributor to WordPress, explained how a critical security bug could have been avoided just by enabling strict mode in SQL. He demonstrated how hackers could exploit a critical vulnerability by using four-byte characters to force MySQL truncation and then inject malicious code in the database.
While a simple solution to prevent such an attack would be to execute the command SET SESSION sql_mode = "STRICT_ALL_TABLES"
it is impossible to enable this without breaking all websites powered by WordPress.
Consult the OWASP Web Testing Guide
OWASP, the Open Web Application Security Project, maintains a comprehensive documentation called the Web Security Testing Guide (WTSG) that includes input validation.
This guide offers information on how to test various injections and other sneaky attacks on inputs. The content is frequently updated, and there are detailed explanations for various scenarios.
For example, you can check out their page on Testing for Stored Cross Site Scripting to learn how persistent XSS works and how to reproduce the exploit.
Also read: OWASP Names a New Top Vulnerability for First Time in Years
Bottom Line: Sanitize, Validate, and Escape Late
Sanitizing and validating inputs is a mandatory dev practice but you cannot apply a generic solution to all entries. You have to consider the specific contexts to be able to block injections. Moreover, don’t store anything in the database without validating it, but also escape values before displaying them, as some injections can poison database records.
Another essential practice is to escape data as late as possible, preferably just before display. This way, you perfectly know the final context and there’s no way to leave data unescaped.
Lastly, spend time on fine-tuning static code analysis. This process can tend to generate a lot of false positives, such as XSS flaws that can’t be exploited; however, every single HTML attribute and tag that gets its value dynamically should be escaped.
While hackers won’t be able to exploit all tags to grab sensitive data or trick logged in users, you should still incorporate static analysis to prevent as many vulnerabilities as possible.
Read next: