Best-practice methods to prevent CSV formula injection attacks in Node.js, Django, Flask, Java & PHP

When we do penetration tests at Audacix, one of the most common vulnerabilities we come across is being vulnerable to CSV/Formula injection attacks.

Developers often need to provide users with the ability to export complex data from applications into CSV (Comma-Separated Values) files. While this is a highly convenient feature, CSV files also come with a significant security risk: CSV injection attacks.

CSV injection, also known as formula injection, occurs when a website or web application allows users to export data to a CSV file to be used with spreadsheet applications without properly validating the content. Without this crucial validation step, the exported CSV file could contain maliciously crafted formulas.

The threat posed by CSV injection attacks is not to be taken lightly. Cybercriminals can leverage this vulnerability to gain unauthorized access, steal data, or even launch more sophisticated attacks.

In this best-practice guide, I'll give you an in-depth look at CSV formula injection and provide you with practical steps to prevent CSV formula injection attacks in your software.

Below, I've given you code snippets you can use in your app to prevent these vulnerabilities. Use our boilerplate code as you see fit, but please test it thoroughly for functional and security bugs before shipping to prod. By using any code provided here, you accept that it's provided for educational purposes and we cannot be held responsible for any errors, ommissions or liability for your use of this code or any concepts discussed here.

What is CSV Formula Injection?

CSV formula injection occurs when an attacker injects malicious formulas or code into the CSV data, taking advantage of how spreadsheet applications interpret certain characters. Specifically, the attacker may insert formulas starting with the following characters:

Equal sign (=)
Plus sign (+)
Minus sign (-)
At symbol (@)
Tab (0x09)
Carriage return (0x0D)

When a spreadsheet editor program such as Microsoft Excel or LibreOffice Calc, Google Sheets is used to open a CSV/Excel, any cells starting with = will be interpreted by the software as a formula. Maliciously crafted formulas can be used for three key attacks:

Hijacking the user’s computer by exploiting vulnerabilities in the spreadsheet software.
Hijacking the user’s computer by exploiting the user’s tendency to ignore security warnings in spreadsheets that they downloaded from their own website.
Exfiltrating contents from the spreadsheet, or other open spreadsheets.

How does CSV Formula Injection Attacks Work?

For example, consider the following CSV file with columns "Name,Email,Phone". Now check out these possible entries for each column:

John Doe,john@example.com,=SUM(1,2)
Jane Smith,jane@example.com,555-1234

In the first example, the attacker has injected the formula =SUM(1,2) into the "Phone" column. When the user opens the CSV file, the spreadsheet application will interpret this as a formula and display the result (3) instead of the actual phone number.

But the potential damage goes much further. Attackers can use this technique to execute more complex and harmful actions, such as:

Downloading and running malware: The attacker can craft a formula that causes the spreadsheet application to download and execute a malicious executable file and execute malicious code
Stealing sensitive data: The attacker can create a formula that sends the user's data to a remote server controlled by the attacker's server
Gaining unauthorized system access: The attacker can use a formula to execute arbitrary commands on the user's system, potentially granting them full access.

Preventing CSV Formula Injection Attacks in Python (Django/Flask)

To protect your software from CSV/Formula injection attacks in Python code, follow these best practices:

Validate and sanitize user input

Before including any user-supplied data in the generated CSV file, thoroughly validate and sanitize it to remove any potentially malicious characters or formulas:

import csv
import html

def sanitize_csv_data(data):
    """Sanitize CSV data to prevent formula injection attacks."""
    sanitized = []
    for row in data:
        sanitized_row = []
        for cell in row:
            sanitized_cell = str(cell).replace('=', '').replace('+', '').replace('-', '').replace('@', '')
            sanitized_cell = html.escape(sanitized_cell)
            sanitized_row.append(sanitized_cell)
        sanitized.append(sanitized_row)
    return sanitized

Wrap field values in double quotes

Enclose each field value in double quotes to ensure that the spreadsheet application interprets the content as text rather than a formula:

import csv

def write_csv_file(filename, data):
    """Write sanitized data to a CSV file."""
    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        for row in data:
            quoted_row = [f'"{str(cell)}"' for cell in row]
            writer.writerow(quoted_row)

Prepend field values with a single quote

As an additional measure, you can prepend each field value with a single quote (') to further ensure that the spreadsheet application treats the content as text:

import csv

def write_csv_file(filename, data):
    """Write sanitized data to a CSV file."""
    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        for row in data:
            quoted_row = [f"'{str(cell)}'" for cell in row]
            writer.writerow(quoted_row)

Escape double quotes within field values

If your data contains double quotes, be sure to escape them by using two consecutive double quotes (""):

import csv

def write_csv_file(filename, data):
    """Write sanitized data to a CSV file."""
    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        for row in data:
            quoted_row = [f'"{str(cell).replace('"', '""')}"' for cell in row]
            writer.writerow(quoted_row)

Preventing CSV Formula Injection Attacks in Java

Apps written in Java or variants of Java are also vulnerable to CSV injection attacks and this is a step-by-step guide on how to prevent these attacks in your app.

Sanitizing CSV Data in Java

When generating CSV files in Java, you can use the java.io.PrintWriter class to write the data, and the java.lang.String#replace() method to sanitize the input.

Here's an example:

import csv
import html

def sanitize_csv_data(data):
    """Sanitize CSV data to prevent formula injection attacks."""
    sanitized = []
    for row in data:
        sanitized_row = []
        for cell in row:
            sanitized_cell = str(cell).replace('=', '').replace('+', '').replace('-', '').replace('@', '')
            sanitized_cell = html.escape(sanitized_cell)
            sanitized_row.append(sanitized_cell)
        sanitized.append(sanitized_row)
    return sanitized

In this example, the exportCsv() method takes a file name and a 2D array of data, and writes the data to a CSV file. The key points are:

Wrapping each cell value in double quotes.
Escaping any existing double quotes within the cell value by replacing them with two consecutive double quotes ("").

This approach ensures that the spreadsheet application will interpret the cell values as text, rather than as formulas that could be used for a CSV injection attack.

Validating CSV Data in Java

In addition to sanitizing the CSV data, you should also validate the input before generating the CSV file. This can be done using regular expressions or other validation techniques.

Here's an example of using regular expressions to validate the input:

import java.util.regex.Pattern;

public class CsvValidator {
    private static final Pattern FORMULA_PATTERN = Pattern.compile("^(=|\\+|-|@).*");

    public static boolean isCsvDataValid(String[][] data) {
        for (String[] row : data) {
            for (String cell : row) {
                if (FORMULA_PATTERN.matcher(cell).matches()) {
                    return false;
                }
            }
        }
        return true;
    }
}

The isCsvDataValid() method takes a 2D array of data and checks each cell value for the presence of characters that could be used in a CSV injection attack (=, +, -, @). If any of these characters are found at the beginning of a cell value, the method returns false, indicating that the data is not safe to export.

Combining Sanitization and Validation

To ensure that your CSV export functionality is secure, you should combine the sanitization and validation steps. Here's an example of how you could do this:

import java.io.IOException;
import java.io.PrintWriter;
import java.util.regex.Pattern;

public class SecureCsvExporter {
    private static final Pattern FORMULA_PATTERN = Pattern.compile("^(=|\\+|-|@).*");

    public static void exportSecureCsv(String fileName, String[][] data) {
        if (!isCsvDataValid(data)) {
            throw new IllegalArgumentException("CSV data contains potentially malicious content.");
        }

        try (PrintWriter writer = new PrintWriter(fileName)) {
            for (String[] row : data) {
                StringBuilder rowBuilder = new StringBuilder();
                for (String cell : row) {
                    rowBuilder.append("\"")
                              .append(cell.replace("\"", "\"\""))
                              .append("\",");
                }
                writer.println(rowBuilder.substring(0, rowBuilder.length() - 1));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static boolean isCsvDataValid(String[][] data) {
        for (String[] row : data) {
            for (String cell : row) {
                if (FORMULA_PATTERN.matcher(cell).matches()) {
                    return false;
                }
            }
        }
        return true;
    }
}

In this example, the exportSecureCsv() method first checks the input data for any potentially malicious content using the isCsvDataValid() method. If the data is valid, it proceeds to sanitize the data and write the CSV file. If the data is not valid, it throws an IllegalArgumentException to inform the caller that the data cannot be exported safely.

Validating CSV Data in PHP

Here's an example function to sanitize inputs and prevent formula injection:

function preventFormulaInjection($input) {
    // List of characters that could trigger formula execution in spreadsheets
    $dangerousChars = ['=', '+', '-', '@'];
    
    // Check if the input starts with any of these characters
    if (in_array($input[0], $dangerousChars)) {
        // Prepend a single quote to neutralize the formula
        $input = "'" . $input;
    }

    return $input;
}

You can use this function on any data that will be exported to a spreadsheet:

$data = preventFormulaInjection($_POST['user_input']);
// Now you can safely export $data to your spreadsheet

Validating CSV Data in Node.js:

Here’s how you can implement formula injection prevention in Node.js using csv-parser library and a custom sanitization function.

Install Required Packages:

npm install csv-parser fast-csv

Sanitize Function: Create a function to sanitize cell data by prefixing dangerous characters.

function sanitizeCell(cell) {
  const dangerousChars = ['=', '+', '-', '@'];
  if (typeof cell === 'string' && dangerousChars.includes(cell[0])) {
    return `'${cell}`;
  }
  return cell;
}

Processing CSV Data: Use the fast-csv library to parse and sanitize CSV data.

const fs = require('fs');
const fastcsv = require('fast-csv');

const sanitizeCell = (cell) => {
  const dangerousChars = ['=', '+', '-', '@'];
  if (typeof cell === 'string' && dangerousChars.includes(cell[0])) {
    return `'${cell}`;
  }
  return cell;
};

fs.createReadStream('input.csv')
  .pipe(fastcsv.parse({ headers: true }))
  .transform((row) => {
    for (let key in row) {
      row[key] = sanitizeCell(row[key]);
    }
    return row;
  })
  .pipe(fastcsv.format({ headers: true }))
  .pipe(fs.createWriteStream('output.csv'));

The sanitizeCell function checks if a cell starts with a dangerous character and prefixes it with a single quote.

The CSV is parsed row by row, and each cell is sanitized before writing it to the new CSV file.

What can you do next?

By implementing these safeguards, you can effectively protect your software and your users from the dangers of CSV formula injection attacks. Remember to thoroughly test your CSV export functionality to ensure that your mitigation measures are working as expected.

However, if you have this CSV injection vulnerability in your app then you probably have other vulnerabilities that you need to find and patch.

There are very few tools that allow software developers to protect their web applications against hackers - without the help of specialist application security experts.

However, Cyber Chief is one such tool that allows you to run regular vulnerability scans with a web app vulnerability scanning tool for vulnerabilities like this and others like JWT token security, rate-limiting vulnerabilities and thousands more.

You can see how Cyber Chief works now to understand not only how it can help to keep attackers out, but also to see how you can ensure that you ship every release with zero known vulnerabilities.

Cyber Chief helps you run automated tests for your web app security, API security and cloud infrastructure security and each subscription comes with:

Results from scanning your application for the presence of OWASP Top 10 + SANS CWE 25 + thousands of other vulnerabilities.
A detailed description of the vulnerabilities found.
A risk level for each vulnerability, so you know which GraphQL endpoints to fix first.
Best-practice fixes for each vulnerability, including code snippets where relevant.
On-demand security coaching support from our application security experts to help you patch vulnerabilities faster.

Here's what you can do next.

See Cyber Chief In Action >>

Run Express Scan On Your App

Tuesday, September 3, 2024