Cleaning Up Double Spaces in SQL: Why and How
Why Do We Need to Clean Double Spaces?
If you work with data, you’ve probably seen it: “Messy text with too many spaces.”
Double (or more) spaces sneak into data for all sorts of reasons—manual entry, copy-paste errors, or system quirks. These extra spaces can cause problems:
SQL doesn’t have a built-in, one-step way to collapse all double (or more) spaces into single spaces. That’s where a reusable function comes in handy!
The Clever Approach
Replace all spaces with two special characters! Simple really 🤭
Here’s how it works, step by step:
The function has defaults of CHAR(17) and CHAR(18) - WHAT ARE THEY?
CHAR(17) and CHAR(18) are simply ASCII characters that are rarely used in typical text. Think of them like secret codes or placeholders.
These are non-printable control characters from the ASCII (American Standard Code for Information Interchange) table. In the early days of computing, these characters were used to control hardware devices (like printers or terminals), but they don't have a visual representation when you type them or display them in most modern contexts.
Why they are used in this context:
The cleverness of using CHAR(17) and CHAR(18) in the double-space cleaning function comes from their rarity in actual text data.
The goal of the "marker" method is to temporarily replace spaces with something unique that won't accidentally be part of your original string. If you used something common like ## as a marker, and your original string was "Price ## 100", the function would mess up your data.
By choosing CHAR(17) and CHAR(18), the chances of them naturally appearing in a user's name, address, or product description are extremely low. This makes them ideal "safe" temporary placeholders for the string manipulation trick.
Why Is This Function Good Practice?
The Solution: A Reusable, Safe SQL Function
CREATE OR ALTER FUNCTION Toolbox.CleanDoubleSpaces
(
@InputString NVARCHAR(MAX),
@TempChar_1 NCHAR(1),
@TempChar_2 NCHAR(1)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @Result NVARCHAR(MAX);
DECLARE @TempMarker NVARCHAR(2);
DECLARE @OverlapMarker NVARCHAR(2);
DECLARE @TempChar1 NCHAR(1);
DECLARE @TempChar2 NCHAR(1);
SELECT @TempChar1 = ISNULL(@TempChar_1,CHAR(17))
SELECT @TempChar2 = ISNULL(@TempChar_2,CHAR(18))
-- Handle NULL input
IF @InputString IS NULL
RETURN NULL;
-- Handle empty string
IF LEN(@InputString) = 0
RETURN @InputString;
-- Handle no spaces
IF CHARINDEX(' ', @InputString) is NULL
RETURN @InputString;
-- Create our marker strings
SET @TempMarker = @TempChar1 + @TempChar2;
SET @OverlapMarker = @TempChar2 + @TempChar1;
-- Apply the three-step process
SET @Result = REPLACE(
REPLACE(
REPLACE(@InputString, ' ', @TempMarker),
@OverlapMarker, ''
),
@TempMarker, ' '
);
RETURN @Result;
END;
Ensuring the Toolbox Schema Exists
Here’s how you can ensure the toolbox schema exists before creating your function. You only need to run this code once per database, but it’s a good habit to include it in your deployment scripts.
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'Toolbox')
EXEC('CREATE SCHEMA Toolbox');
GO
Why Is This Better Than Looping?
A common way to remove double spaces is to use a loop: You keep replacing ' ' (two spaces) with ' ' (one space) over and over until there are no more double spaces left.
But looping has drawbacks:
The marker method is better because:
In short: The marker method is more efficient, more reliable, and more readable than looping!
Usage Examples
-- Basic usage (safe for all data)
SELECT Toolbox.CleanDoubleSpaces('Hello world! This is SQL.', null, null);
-- Returns: 'Hello world! This is SQL.'
-- With custom temporary characters (if you need to avoid CHAR(17) and CHAR(18))
SELECT Toolbox.CleanDoubleSpaces('Messy data', CHAR(1), CHAR(2));
Extra Tips and Best Practices
1. Be Mindful of Data Types
2. Handling NULLs and Empty Strings
3. Performance
Conclusion
Cleaning up double spaces in SQL is a common need, and this function makes it easy, efficient, and safe for all your data. Add it to your toolkit to keep your data—and your reports—looking sharp!
Happy cleaning!
Healthcare Data Leadership and Execution
1moThis seems like a clean cut scenario to use regex. I'm curious what would lead one to create a function like this?
Many SQL DBMSes allow you to use SELECT REGEXP_REPLACE('Hello world! This is SQL.',' +',' ','g')
Data Platform MVP (2020-2023) | Certified Expert in Fabric Analytics Engineering, Azure Data Engineering, Solutions Architecture, DevOps, and Development
1moData cleaning techniques are always important!
Taking a machete to analytics inefficiencies
1moThat's actually pretty slick. Of course, if you're a bit of a chaos monkey, once you get everything down to a single space, replace all the single spaces with chr(255) :D
Head of Data | Digital Transformation Leader | Author of “Beyond the Numbers” Newsletter | Open to New Opportunities
1moNice, like that James Reeves thank you 🙏