Friday, January 17, 2014

T-SQL: Comma Separating Values into Fields

Level: 2 where 1 is noob and 5 is totally awesome
System: SQL Server from 2005 and up


Visualising data


When querying data from databases, I'm always trying to get resultsets which are most presentable as possible. It save me a lot of time, because I then don't need to transform data further. The upsides are, I can either save a resultset as csv or just copy paste it to some spreadsheet or to a mail, right after the querying. The down side is, I have to type a bit more SQL.

In this blog post, I'll present one of my favorites methods, to easily visualize data in a smart way. I'm going to explain the method, using these 2 tables:

Items:
Id          Name                                               ColorId
----------- -------------------------------------------------- -----------
1           Cranberries                                        1
2           Plants                                             2
3           Shrimps                                            1
4           Ocean                                              3
5           Sky                                                3
6           Roses                                              1  

And

Colors:
Id          Color
----------- --------------------------------------------------
1           Red
2           Green
3           Blue

The most used way to show, the color for each item would be to join the tables like:

 SELECT Items.Id, Items.Name, Color FROM Items  
 INNER JOIN Colors ON Items.ColorId = Colors.Id  

Which gives:

Name                                               Color
-------------------------------------------------- -----------------
Cranberries                                        Red
Plants                                             Green
Shrimps                                            Red
Ocean                                              Blue
Sky                                                Blue
Roses                                              Red

It is readable, and if we sort by color it might be more readable. It might be most readable, if we could group the items by color. It requires a bit more SQL; but the result is better:

 SELECT   
  STUFF(  
      (SELECT ', ' + Name  
      FROM Items  
      where Items.ColorId = Colors.Id  
      FOR XML PATH (''))  
      , 1, 1, '') AS Items,  
 Color  
 FROM Colors;  

Gives:

Items                                Color
------------------------------------ ------------------------------------------
Cranberries, Shrimps, Roses          Red
Plants                               Green
Ocean, Sky                           Blue

Explanation


The magic happens in the inner SQL. FOR XML PATH is used as a smart way for concatenating string columns to one string column. By adding the ',' to Name, we are getting a comma separated list, where the first char is ','. For more about FOR XML PATH see: http://technet.microsoft.com/en-us/library/ms189885.aspx

STUFF are used to remove the first char(in this case ','), from the comma separated lists. Stuff is a sql command for inserting strings into strings, but is can also remove chars. In this case we are overwriting first char with a empty string. For more about STUFF see: http://technet.microsoft.com/en-us/library/ms188043.aspx

The rest should be self explanatory :-)


4 comments:

  1. Why STUFF? Substring is simpler and everybody knows

    ReplyDelete
    Replies
    1. That is a very good question indeed. The answer is:

      Looking at substrings parameters:

      SUBSTRING ( expression ,start , length )

      You will notice, that substring requires a length. Providing the length of the expression in this case is quite difficult. The problem is, we are not storing the string from the inner select, so can't use LEN(). You could extends the snippet, to store the string from the inner select, and then use SUBSTRING combined with LEN(stored string), but then you are increasing complexity, and losing the compactness of the snippet.

      Delete
  2. In this case I just use 999999999999 for length

    ReplyDelete
    Replies
    1. It is also a way to do it :-)

      I'm not sure it is pretty solution in my book, because there is a t-sql function(STUFF), which can do the work without i have to provide a magic number. But it all comes down to taste

      According to the query plan and statistics, the cost is more or less the same.

      So _performance_ wise, I can't find any evidence for using STUFF over SUBSTRING, in this case.

      Delete