Pivoting Data in SQL Server

Hi Blog, long time no see! How’s it going?!  I know, I really should up my blog game….

I was recently approached by my firms Marketing Manager with a request for some information.  She wanted to know “Which departments have our top clients never done any work for?”.  For some clarity, I work in a law firm with 11 departments. The request seems pretty straightforward at first. Then once I got thinking about how the output of this report would be presented it made me reconsider quite how simple a request this was.  She handed me a drawing with her vision for the output.  What she wanted was :

  1. Client’s names down the left,
  2. List of Departments across the top,
  3. Ticks and crosses at the intersection to show whether they had or had not done work for them.

Finding out the basic information she was asking for was not going to be tough, but I had concerns about the presentation. Sure, I could output it in any old way, import into Excel, do some Pivot Table jiggery-pokery and get it to look how she wanted.  But where is the opportunity for learning in that approach?! No, I decided that I was going to have the query itself output in the way she wanted somehow….Pivot Table…… Pivot……. Table….That’s when 2 things hit me:

  1. I had recently been reading a chapter in my SQL MSCA book about pivoting and unpivoting data that I had been struggling with.  I had been waiting for a real world example to sink my teeth into, and this seemed like just the thing.
  2. Pivot Tables in Excel and PIVOT in SQL are not called the same thing by accident!

Technical Bit

before we can concern ourselves with PIVOTING any data, the first thing you need to do is get your data t be be pivoted. This requires 3 elements :

  1. Grouping Column – this is what will be your rows,
  2. Spreading Columns – this will be the columns,
  3. Aggregation Column – this will be what is displayed at the intersection of your rows and columns.

For me, my Grouping Column was the Client’s name, the Spreading Columns were the departments, and the Aggregation Column was whether the client had done work for that department ( I was going to get this data by doing a count of the number of legal matters that had been opened).

Once you have your data, then the next thing to do is to PIVOT it. Below you can see the syntax for setting out your 3 data columns as a CTE to then be pivoted.

WITH PivotData AS
(
    SELECT
        [grouping column]
        [spreading columns]
        [aggregation column]
    FROM 
)
SELECT [select list]
FROM PivotData
    PIVOT ([aggregate function]([aggregation column])
        FOR [spreading column] IN ([distinct spreading values]) ) AS P;

When I first saw this, the actual pivoting part looked pretty daunting.  Turns out, its really not too bad. It just looks daunting when its laid out in front of you. Once you understand what’s going on, it make a lot of sense.

  • [SELECT LIST] –  This is where you say what you want to take from your original query and lay out as columns.  So for my example, my first column is the clients, and the subsequent columns are each of my departments individually.
  • [AGGREGATE FUNCTION]([AGGREGATION COLUMN]) – You need to tell SQL what to do with the data that you want at the intersection of your columns. For me, this was a count of the number of cases that each client had worked on for each department.
  • [SPREADING COLUMN] IN ([DISTINCT SPREADING VALUES] – What values for your initalls specified spreading column (the data that will make up your columns headings) you want. My spreading column was our Departments, and my distinct spreading values was each of the names of the Departments I wanted to check against.

I appreciate that my explanation may still sound somewhat complicated, but hang in there.  When you see my final working code and it’s output, hopefully it’ll all fall into place.

WITH PivotData
AS (
        SELECT E.Name AS 'Client'
            , D.Description AS 'Dept'
            , COUNT(D.Description) AS 'DCount'
        FROM DAB_MattersALL AS M
        JOIN Users AS U ON M.FeeEarnerRef = U.Code
        JOIN Departments AS D ON U.Department = D.Code
        JOIN Entities AS E ON M.Entityref = E.Code
        WHERE M.EntityRef IN (
                         SELECT TOP (20) E.Code AS 'Client'
                         FROM Departments AS D
                         INNER JOIN Users AS U ON D.Code = U.Department
                         INNER JOIN Ac_Billbook AS A ON U.Code = A.SubmittingFeeEarner
                         INNER JOIN Entities AS E ON A.EntityRef = E.Code
                         WHERE A.BillDate >= '2008-01-01 00:00:00'
                             AND A.BillDate < GETDATE()
                         GROUP BY E.Code
                             , E.Name
                         ORDER BY CAST(SUM(A.CostsNet) AS MONEY) DESC
                         )
        GROUP BY E.Name
            , D.Description
    )

SELECT Client
    , [Banking ]
    , [Commercial and IP]
    , [Commercial Property]
    , [Construction]
    , [Corporate]
    , [Employment]
    , [Family]
    , [Intellectual Property]
    , [Licensing]
    , [Litigation]
    , [Management]
    , [Real Estate]
    , [Residential Property]
    , [Tax and Probate]
FROM PivotData
PIVOT(SUM(DCOUNT) FOR Dept IN (
                        [Banking]
                        , [Commercial and IP]
                        , [Commercial Property]
                        , [Construction]
                        , [Corporate]
                        , [Employment]
                        , [Family]
                        , [Intellectual Property]
                        , [Licensing]
                        , [Litigation]
                        , [Management]
                        , [Real Estate]
                        , [Residential Property]
                        , [Tax and Probate]
                        )) AS P
ORDER BY Client

That looks like a giant wall of text, right?!  It’s not, don’t worry.  When you break it down it’s simple!  The first part is just setting out what I want (Clients, Departments, and the count of the Departments).

The PIVOT part is saying – Give me some columns named “Clients”, and then subsequent columns named after each possible department they may have done any work for.  At the intersection between each client and each department give me the number of cases each client has had with the departments in the following list.

Finally, this is the output (with client names poorly redacted)  that you get from all this :

Capture

Summary

So, I hope that made sense. Writing this out really helped cement it in my brain as something that I now understand. I find reading about new techniques in SQL (or anything else really) can be really daunting and demoralising until I find a practical application.  That’s when it all seems to make sense.

Pivoting is now something I feel pretty confident with and is another arrow in my quiver of SQL functions I can call upon when necessary.

SQL Data Types – Strings

This is the first post in what I hope to be a mini-series into the main data types that a beginner will come across when using T-SQL.  First in this series is Character Functions and how to manipulate Strings.

What is a String?

A string is a data type used for values that are made up of ordered sequences of characters, such as “hello world”. A string can contain any sequence of characters, visible or invisible, and characters may be repeated. The number of characters in the string is called its length, and “hello world” has length 11 – made up of 10 letters and 1 space. There is usually a restriction on the maximum length of a string. There is also such a thing as an empty string, which contains no characters – length 0. (credit for the definition of a string goes to BBC Bitesize).  It is fair to assume that most databases contain strings and, as such, this is probably a good place to start this series.

String Manipulation and Functions

Whilst T-SQL may not have been designed with advanced character string manipulation as one of its key features, it is something that I find myself doing quite frequently.  Here is an outline of the main ways T-SQL provides to  manipulate strings.

Concatenation

Concatenation is the act of combining multiple string together to form 1 new string.  For instance “First Name” added to “Surname” to form “Full Name”.  T-SQL provides 2 ways to do this, the plus (+) operator, and the CONCAT function.

This first example is string concatenation using the + operator.

SELECT empid
    , country
    , region
    , city
    , country + N',' + region + N',' + city AS 'location'
FROM HR.Employees

And this is what is returned :

Capture

As you can see, when any value of the inputs is NULL then the value returned using the + operator is also NULL.  Clearly, there will be times when NULL is not what you want in your output.  Luckily, there are 2 ways around this, CONCAT and COALESCE.

CONCAT – By default, this substitutes NULLs with empty strings.

COALESCE (<expression>, ”) – This is used to join to show an empty string when the region is NULL in the below example.

SELECT empid
    , country
    , region
    , city
    , CONCAT (country, N',' + region, N',' + city) AS 'location_concat'
    , country + COALESCE( N',' + region, N'') + N',' + city AS 'location_coalesce'
FROM HR.Employees

(not sure why CONCAT didn’t go pink in this code!)

And you can see the results are the same from each of these below :

Capture

Substrings

There are a few functions to perform actions to a part of a string such as pattern matching, and extracting part of a string.

The SUBSTRING function allows you to extract only a section of a string.  you specify the input string, the position of the string to start from, and the length of the substring required. e.g, to return abc from abcdef you could use:


SELECT SUBSTRING('abcdef',1,3) AS' substring'

There are also LEFT and RIGHT functions that work in much the same way, except that you only need to specify the number of characters required from the left or right ends of the string.  For instance :


SELECT LEFT ('abcdef' ,3) AS 'left'

    , RIGHT('abcdef' ,3) AS 'right'

Capture

Finally with SUBSTRING there is CHARINDEX and PATINDEX which return the position in a string of particular characters or patterns as a numeric value.  For instance the below returns 5 as ‘ef’ starts at the 5th position of the string ‘abcdef’


SELECT CHARINDEX('ef', 'abcdef')

PATINDEX works in much the same way but can be used to find patterns rather than a constant string.

You can combine a few of these together to get some quite clever results.  Here I used CHARINDEX and LEFT to separate a 1st name from a 2nd name :


SELECT LEFT('Daniel Blank', CHARINDEX(' ', 'Daniel Blank') -1)

This just returns ‘Daniel’

Length

T-SQL has 2 functions for measuring the length of an input value, LEN and DATALENGTH.  LEN returns the length of an input string in terms of the number of characters (ignoring trailing spaces).  DATALENGTH returns  the length of the input in terms of the number of bytes a string takes up.  For instance :

SELECT LEN ('Daniel Blank') AS 'len'
      , DATALENGTH (N'Daniel Blank') AS 'datalength'

Capture

Alteration

There are three supported T-SQL functions used to alter an input string.  REPLACE, REPLICATE, and STUFF.  These are pretty self-explanatory from the code below but I will give a brief explanation of each.


SELECT REPLACE ('Daniel.Blank','.',' ')  AS 'replace'

, REPLICATE('a',10) + 'rgh!' AS 'replicate'

, STUFF('abcdefghijklmnop', 5, 5, '12345') AS 'stuff'

Capture

 

REPLACE lets you replace part of a string by specifying the input, the substring to be replaced, and the substring to replace with.

REPLICATE allows you to repeat an input string a specified number of times.  In the above, we replicated the letter ‘a’ 10 times.

STUFF deletes a section of a string and adds in another string in its place. You specify the input string, the character position to start the delete, how many characters to delete, and then the string to ‘stuff’ into its place.

Formatting

Lastly we have the functions to apply formatting options to a string.  These are UPPER, LOWER, LTRIM, RTRIM and FORMAT.  The first  four are fairly self-explanatory once you see what they do, but FORMAT may take a little explanation.


SELECT UPPER('DAniEl bLANk') AS 'upper'

, LOWER('DAniEl bLANk') AS 'lower'

, LTRIM('              Daniel Blank') AS 'ltrim' , RTRIM('Daniel Blank              ') AS 'rtrim'

, RTRIM(LTRIM('     DanielBlank     ')) AS 'l_and_r_trim'

, FORMAT(123456,'000000000') AS 'format'

Capture

UPPER and LOWER allow you to format a string in either upper or lowercase characters.

LTRIM and RTRIM allow the removal of either leading or trailing spaces respectively.

FORMAT allows you to format an input value based on a format string.  So in the above example the number 123456 is formatted as a character string with a size of 9 characters with leading zeros.

Outro

String are one of the fundamental data types that you will come across when working with databases.  Above I have covered some of the most common way of dealing with this data types.  Hopefully someone out there finds it interesting / useful.  For me, it’s good to have this typed up to reinforce what I have been covering whilst studying for my MCSA.

Note: A lot of this information has been sourced from the official Microsoft 70-461 Exam Training Kit which I am currently working through.

#TSQL2sday – Recovering Orphaned Files with Powershell and SQL

Being relatively new to SQL, this’ll be my first post for TSQL2sday.  For those not in the know, TSQL2sday (or T-SQL Tuesday) was started by Adam Machanic (t/b) and, I quote from this month’s host Rob Sewell (t/b), is :

“…a chance for you to join in the SQL Server community and write a blog post on a suggested topic. It makes for a great way to find a bunch of blog posts showing the same subject from many different viewpoints”

This month’s topic (if you haven’t already guessed from the title) is Powershell.

Now, Powershell is my 1st coding love. I’m not brilliant at it, but I’m definitely more than a beginner.  It’s what I spend most of my working life playing with, and it was the reason I eventually got in to trying to learn me some SQL.  Baring all this in mind, this was the first month that I’ve felt confident enough to join in, so here goes!

Creating a Scheduled Powershell Task to Clean up Orphaned Files in our SQL Database.

Catchy title, right?  Basically, what I have been playing with this month is using a combination of SQL and Powershell to help with a small mess that we have recently found ourselves in.  For this to make sense, I guess it’s going to need some sort of context.  I’ll try and be brief.

Context

I work at a relatively small but successful law firm, and our practice pretty much lives and dies with our Case Management System (CMS).  if you don’t know what that is then, suffice to say, it’s like an IT Service Desk tool on steroids, lots of steroids.  This software stores all documents, letters, email and anything else that could possibly be associated with a legal case, and organises them all in some sort of logical fashion. Suffice to say we have literally millions of these files.  One of the main ways that these files are imported into the CMS is by an Outlook plugin that files the email and / or attachments.  The users is prompted to enter where they want to file them, and off they go.

We also use folder redirection for our users, so that their files are on a server rather than on their local computers.  It works pretty well, has some issues here or there, but it’s OK. Folder Redirection isn’t the star of this show.

Over the last couple of months, there have been a couple of small issues with the server that houses the folder redirection data.  This has seemingly had a knock on effect where emails filed into the CMS during a folder redirection blip are seemingly lost.

What’s the Problem?

Well, each file in the CMS has a database entry that points to the files location on a server somewhere.  During a Folder Redirection blip, if you were to simply look at the CMS, it would appear that the files have been imported correctly. However, when trying to open them, you are greeted with “file can not be found”.  What seems to have happened is that the file path has only the file name in it, not the actually crucial path part.  After a lot of digging, these files (mostly) seem to have ended up in the root directory of the Folder Redirection share.

The first couple of times this happened, it took us ages to manually figure out where these were supposed to be filed, drag and drop the files there, and then manually updating the paths.  The first time, this was irritating. The second time, infuriating. The third time lead me to this post, I wanted to never have to do this again!

There are a few things to note about this issue. Firstly, not all files I find in the database have been located in the Folder Redirection share root. Secondly, not all files in the Folder Redirection share root belong to Database entries with missing paths.  Lastly, all files are associated with a legal case, these take the form of an entity (or “customer”) and a matter number.  An example of this would be Entity ABC123 Matter 6 or ABC123/6.  The files associated with this matter need to be stored in a path based on this, so for this matter the files would be stored in \\KSLFS01\ptrdata\docs\A\B\C\ABC123\6.  To get each file going into the correct folder is something I needed to figure out.

The solution

This is my wonderful solution.  Normally I’d use the code formatting of WordPress, but because its a Powersehll script with SQL code in there, it just didn’t format well, so excuse the picture :

code
My wonderful solution!

This solution is essentially doing 3 separate functions combined together:

  1. Query the database to find all files in the database that meet my missing path criteria.
  2. Check the content of the Folder Redirection share root folder and check to see if the files are in there.
  3. If the files from the 1st step are found in the folder from the 2nd step, migrate the file to the correct location and update it’s path in the database.

SQL was obviously used to find the relevant files and then update the the file path when required.  CONCAT and SUBSTRING were used to form the correct file path, and are things that I had not used before this,

What I like about this solution is that it is now something that can be scheduled weekly on the SQL server, and hopefully means that this problem should never really bother us again.

Now, neither the Powershell code, nor the SQL code, are all that complex here. This is the first time I’ve really tried to combine the two into some useful function.  The Powershell itself is pretty straight forward but the SQL did take me a bit of research / trial and error to get right (the evolution can be found in this Github Gist, and this Twitter thread).

Outro

Whether what I have come up with is the best way to resolve this issue or not, is not important here.  For me, this solution to me represents progress.  For the last 10 months I have been teaching myself SQL, and I’ve been trying to learn Powershell for a good 4 years (but that last 2 seriously).  This is the first time I am really seeing the fruits of my labour.  The combination of using SQL and Powershell together made what would have been hours of tedious work into a relatively quick and painless process.

Going forward, I can see how combining SQL and Powershell could really open the doors to more solutions for problems I already have, problems I don’t know exist yet, or just improving some of  my already existing solutions.  I want to start using both of these languages to the full, and have already begun looking at resources, such as DBA Tools, (t/w) a great Powershell module for SQL admins, and have started reading some books by Itzik Ben-Gan (t/b), and working towards my MCSA.  I’m really hoping that this is the beginning of something special!

1vo3u1

What Queries Are Currently Running on SQL Server

Sometimes, your SQL Server seems to grind to a halt.  The first thing you’ll probably want to do in this situation is try and see if there are any queries eating all the available resources.

This can be found by running this Query :

SELECT sqltext.TEXTSELECT sqltext.TEXT
, req.session_id
, req.status
, req.command
, req.cpu_time
, req.total_elapsed_time
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
AS sqltext

Once you have the list of running queries, you can kill that task using this  :

 KILL xxx 

And if you want to do some finger pointing, you can find out the offending user with this:

 EXEC sp_who 'xxx' 

Since finding this out (all credit goes to Pinal Dave) it has proved hugely helpful for me on a number of occasions!