This is the first post in what I hope to be a mini-series into the main data types that a beginner will come across when using T-SQL. First in this series is Character Functions and how to manipulate Strings.
What is a String?
A string is a data type used for values that are made up of ordered sequences of characters, such as “hello world”. A string can contain any sequence of characters, visible or invisible, and characters may be repeated. The number of characters in the string is called its length, and “hello world” has length 11 – made up of 10 letters and 1 space. There is usually a restriction on the maximum length of a string. There is also such a thing as an empty string, which contains no characters – length 0. (credit for the definition of a string goes to BBC Bitesize). It is fair to assume that most databases contain strings and, as such, this is probably a good place to start this series.
String Manipulation and Functions
Whilst T-SQL may not have been designed with advanced character string manipulation as one of its key features, it is something that I find myself doing quite frequently. Here is an outline of the main ways T-SQL provides to manipulate strings.
Concatenation is the act of combining multiple string together to form 1 new string. For instance “First Name” added to “Surname” to form “Full Name”. T-SQL provides 2 ways to do this, the plus (+) operator, and the CONCAT function.
This first example is string concatenation using the + operator.
, country + N',' + region + N',' + city AS 'location'
And this is what is returned :
As you can see, when any value of the inputs is NULL then the value returned using the + operator is also NULL. Clearly, there will be times when NULL is not what you want in your output. Luckily, there are 2 ways around this, CONCAT and COALESCE.
CONCAT – By default, this substitutes NULLs with empty strings.
COALESCE (<expression>, ”) – This is used to join to show an empty string when the region is NULL in the below example.
, CONCAT (country, N',' + region, N',' + city) AS 'location_concat'
, country + COALESCE( N',' + region, N'') + N',' + city AS 'location_coalesce'
(not sure why CONCAT didn’t go pink in this code!)
And you can see the results are the same from each of these below :
There are a few functions to perform actions to a part of a string such as pattern matching, and extracting part of a string.
The SUBSTRING function allows you to extract only a section of a string. you specify the input string, the position of the string to start from, and the length of the substring required. e.g, to return abc from abcdef you could use:
SELECT SUBSTRING('abcdef',1,3) AS' substring'
There are also LEFT and RIGHT functions that work in much the same way, except that you only need to specify the number of characters required from the left or right ends of the string. For instance :
SELECT LEFT ('abcdef' ,3) AS 'left'
, RIGHT('abcdef' ,3) AS 'right'
Finally with SUBSTRING there is CHARINDEX and PATINDEX which return the position in a string of particular characters or patterns as a numeric value. For instance the below returns 5 as ‘ef’ starts at the 5th position of the string ‘abcdef’
SELECT CHARINDEX('ef', 'abcdef')
PATINDEX works in much the same way but can be used to find patterns rather than a constant string.
You can combine a few of these together to get some quite clever results. Here I used CHARINDEX and LEFT to separate a 1st name from a 2nd name :
SELECT LEFT('Daniel Blank', CHARINDEX(' ', 'Daniel Blank') -1)
This just returns ‘Daniel’
T-SQL has 2 functions for measuring the length of an input value, LEN and DATALENGTH. LEN returns the length of an input string in terms of the number of characters (ignoring trailing spaces). DATALENGTH returns the length of the input in terms of the number of bytes a string takes up. For instance :
SELECT LEN ('Daniel Blank') AS 'len'
, DATALENGTH (N'Daniel Blank') AS 'datalength'
There are three supported T-SQL functions used to alter an input string. REPLACE, REPLICATE, and STUFF. These are pretty self-explanatory from the code below but I will give a brief explanation of each.
SELECT REPLACE ('Daniel.Blank','.',' ') AS 'replace'
, REPLICATE('a',10) + 'rgh!' AS 'replicate'
, STUFF('abcdefghijklmnop', 5, 5, '12345') AS 'stuff'
REPLACE lets you replace part of a string by specifying the input, the substring to be replaced, and the substring to replace with.
REPLICATE allows you to repeat an input string a specified number of times. In the above, we replicated the letter ‘a’ 10 times.
STUFF deletes a section of a string and adds in another string in its place. You specify the input string, the character position to start the delete, how many characters to delete, and then the string to ‘stuff’ into its place.
Lastly we have the functions to apply formatting options to a string. These are UPPER, LOWER, LTRIM, RTRIM and FORMAT. The first four are fairly self-explanatory once you see what they do, but FORMAT may take a little explanation.
SELECT UPPER('DAniEl bLANk') AS 'upper'
, LOWER('DAniEl bLANk') AS 'lower'
, LTRIM(' Daniel Blank') AS 'ltrim' , RTRIM('Daniel Blank ') AS 'rtrim'
, RTRIM(LTRIM(' DanielBlank ')) AS 'l_and_r_trim'
, FORMAT(123456,'000000000') AS 'format'
UPPER and LOWER allow you to format a string in either upper or lowercase characters.
LTRIM and RTRIM allow the removal of either leading or trailing spaces respectively.
FORMAT allows you to format an input value based on a format string. So in the above example the number 123456 is formatted as a character string with a size of 9 characters with leading zeros.
String are one of the fundamental data types that you will come across when working with databases. Above I have covered some of the most common way of dealing with this data types. Hopefully someone out there finds it interesting / useful. For me, it’s good to have this typed up to reinforce what I have been covering whilst studying for my MCSA.
Note: A lot of this information has been sourced from the official Microsoft 70-461 Exam Training Kit which I am currently working through.