2008年2月26日 星期二

SQL Server string column concatenation

In SQL command, it is easy to perform the SUM aggregation operation on a column containing numeric data. However, it is not so easy to perform a similar operation, namely, string concatenation, on a column containing strings or text.

Here we describe a way to do string concatenation on a data column.

Say we have a table like the following:

table name: t1







IDValue
17v1
17v2
17v3
28v4
28v5


Sometimes we may want our output be like this instead:





IDValue
17v1,v2,v3
28v4,v5


We can use string concatenation technique to achieve the effect.

First we create a temporary storage table called #tmp, with columns tmp_ID, tmp_value, tmp_list. Then we insert data from t1 into #tmp, with the tmp_list column set to NULL, ordered by ID column. The order by criteria is crutial here, for all the data has to be in order for this algorithm to work. Usually we pick the column that serves as the identity column to be in the order by list. And the reason will be elaborated latter.

create table #tmp(tmp_ID varchar(2), tmp_value varchar(10), tmp_list varchar(100))

insert into #tmp select ID, Value, NULL from t1 order by ID

Now our table looks like the following:

table name: #tmp







tmp_IDtmp_valuetmp_list
17v1NULL
17v2NULL
17v3NULL
28v4NULL
28v5NULL


Then we do the following:

declare @list varchar(100)
declare @last varchar(10)
select @list = ''
select @last = ''

update #tmp
set @list = tmp_list = (CASE WHEN @last <> tmp_ID THEN tmp_value ELSE @list + ',' + tmp_value END), @last = tmp_value

In doing this, we set the values in tmp_list column to be the strings concatenated by all the string in tmp_value column in all the preceding rows. Note that whenever the tmp_ID column changes it value, we start the concatenation process all over again. Hence our table looks like the following now:








tmp_IDtmp_valuetmp_list
17v1v1
17v2v1,v2
17v3v1,v2,v3
28v4v4
28v5v4,v5


Now all we have to do is to select the values:

select tmp_ID, max(tmp_list) as tmp_list from #tmp group by tmp_ID

And we'll get the following result:





tmp_IDtmp_list
17v1,v2,v3
28v4,v5


As a side note, there is another way to do the string concatenation on a column.

declare @list varchar(100)
declare @list2 varchar(100)
select @list = ''
select @list2 = ''

select @list = @list+ Value + ',' from t1 where ID = '17'
select @list2 = @list2+ Value + ',' from t1 where ID = '28'

This will create the same results, too. But there is an obvious drawback. Imagine if you have many different ID values in t1, you have to hardcode it many times or put it in a loop, which will make the solution less concise and less desirable. Performance wise, these two are pretty much the same. Space wise, the first solution definitely requires more space.

沒有留言: