2008年2月26日 星期二

SQL Server row formatting using PIVOT, UNPIVOT

Readers are advised to know that the technique described in a previous artivle: SQL Server string column concatenation is also used in here.

Suppose we a table t1 as follows:




IDProperty1Property2Property3Property4
17v1,v3v2v3,v5v2,v4,v6


Another table t2 as follows:









ValueDescription
v1eye
v2mouth
v3ear
v4nose
v5neck
v6hair



And we would like our output to be like this instead:




IDProperty1Property2Property3Property4
17eye,earmouthear,neckmouth,nose,hair



That is, the list of values from t1 has to be translated into more readable contents according to the correspondences in t2. Here are the steps:

1. UNPIVOT t1, converting rows into columns
2. insert converted data, combining with the corresponding values from t2, into temporary table #tmp
3. perform string concatenation
4. PIVOT the resulting table back to it original form

Here are the SQL commands for step 1:

select property, value
from
(
select * from t1 where ID = '17'
) AS p
UNPIVOT
(
value
for property
in (Property1, Property2, Property3, Property4)
) AS unpvt

resulting table:







propertyvalue
Property1v1,v3
Property2v2
Property3v3,v5
Property4v2,v4,v6


Here are the complete SQL commands for step 1 and 2:

create table #tmp(tmp_property varchar(10), tmp_description varchar(10), tmp_list varchar(100))

insert into #tmp(tmp_property, tmp_description, tmp_list)
select property, Description, NULL
from
(
select property, value
from
(
select * from t1 where ID = '17'
) AS p
UNPIVOT
(
value
for property
in (Property1, Property2, Property3, Property4)
) AS unpvt
) AS tmp
inner join t2 on value like '%' + t2.Description + '%'

resulting table:











tmp_propertytmp_descriptiontmp_list
Property1eyeNULL
Property1earNULL
Property2mouthNULL
Property3earNULL
Property3neckNULL
Property4mouthNULL
Property4noseNULL
Property4hairNULL


step 3:

update #tmp
set @list = tmp_list = (CASE WHEN @last <> tmp_property THEN tmp_description ELSE @list + ',' + tmp_description END), @last = tmp_property

resulting table:











tmp_propertytmp_descriptiontmp_list
Property1eyeeye
Property1eareye,ear
Property2mouthmouth
Property3earear
Property3neckear,neck
Property4mouthmouth
Property4nosemouth,nose
Property4hairmouth,nose,hair


step 4:

select *
from
(
select '17' as ID, tmp_property, max(tmp_list) as tmp_list from #tmp_default group by tmp_property
) as p
PIVOT
(
max(tmp_list)
for tmp_property
in (Property1, Property2, Property3, Property4)
) as pvt

resulting table:




IDProperty1Property2Property3Property4
17eye,earmouthear,neckmouth,nose,hair

SQL Server string column concatenation

In SQL command, it is easy to perform the SUM aggregation operation on a column containing numeric data. However, it is not so easy to perform a similar operation, namely, string concatenation, on a column containing strings or text.

Here we describe a way to do string concatenation on a data column.

Say we have a table like the following:

table name: t1







IDValue
17v1
17v2
17v3
28v4
28v5


Sometimes we may want our output be like this instead:





IDValue
17v1,v2,v3
28v4,v5


We can use string concatenation technique to achieve the effect.

First we create a temporary storage table called #tmp, with columns tmp_ID, tmp_value, tmp_list. Then we insert data from t1 into #tmp, with the tmp_list column set to NULL, ordered by ID column. The order by criteria is crutial here, for all the data has to be in order for this algorithm to work. Usually we pick the column that serves as the identity column to be in the order by list. And the reason will be elaborated latter.

create table #tmp(tmp_ID varchar(2), tmp_value varchar(10), tmp_list varchar(100))

insert into #tmp select ID, Value, NULL from t1 order by ID

Now our table looks like the following:

table name: #tmp







tmp_IDtmp_valuetmp_list
17v1NULL
17v2NULL
17v3NULL
28v4NULL
28v5NULL


Then we do the following:

declare @list varchar(100)
declare @last varchar(10)
select @list = ''
select @last = ''

update #tmp
set @list = tmp_list = (CASE WHEN @last <> tmp_ID THEN tmp_value ELSE @list + ',' + tmp_value END), @last = tmp_value

In doing this, we set the values in tmp_list column to be the strings concatenated by all the string in tmp_value column in all the preceding rows. Note that whenever the tmp_ID column changes it value, we start the concatenation process all over again. Hence our table looks like the following now:








tmp_IDtmp_valuetmp_list
17v1v1
17v2v1,v2
17v3v1,v2,v3
28v4v4
28v5v4,v5


Now all we have to do is to select the values:

select tmp_ID, max(tmp_list) as tmp_list from #tmp group by tmp_ID

And we'll get the following result:





tmp_IDtmp_list
17v1,v2,v3
28v4,v5


As a side note, there is another way to do the string concatenation on a column.

declare @list varchar(100)
declare @list2 varchar(100)
select @list = ''
select @list2 = ''

select @list = @list+ Value + ',' from t1 where ID = '17'
select @list2 = @list2+ Value + ',' from t1 where ID = '28'

This will create the same results, too. But there is an obvious drawback. Imagine if you have many different ID values in t1, you have to hardcode it many times or put it in a loop, which will make the solution less concise and less desirable. Performance wise, these two are pretty much the same. Space wise, the first solution definitely requires more space.

.NET master pages

Master pages, as we know, are meant to shared amongst more than one content pages. Imagine the following scenario when we need different pages to edit or insert contents to the same database table, after the "Save" button is clicked on each page. Under the circumstances, it looks natural to put the "Save" button in the master page, because the button is a common element for the two pages.

But here comes the question, how do we make sure that different actions are performed after the button is clicked on different pages? That is, how do we make sure the data is inserted when the button is clicked on the insert page, and the data is edited when the same button is clicked on the edit page?

The trick is to put the declaration of the button in the master page, specifying all properties except for the click events. For example:

<asp:Button ID="btnSave" runat="server" CssClass="button" onmouseout="this.className='btn_mouseout'" onmouseover="this.className='btn_mouseover'" Text="存檔" />

Inside each content page, we specify the click event for the button.

protected void Page_Init(object sender, EventArgs e)
{
Control ctl = null;
ctl = Master.FindControl("btnSave");
if (ctl != null)
{
Button btnSave = (Button)ctl;
btnSave.Command += new CommandEventHandler(btnSave_Command);
}
}

And of course, the function performing the corresponding task is defined in each content page as well.

.NET Regular Expressions

Regular expression, in short, is a way to describe and identify patterns within text. It could be used in text search, pattern matching, text replacement, ...etc. We could imagine regular expression as the partial metadata of text.

Regular expression comes with a set of symbols and rules in order to describe text patterns. For example, [0-9] is used to describe any digit, while [^0-9] is used to describe any non-numeric character. Here are some rules we can use in regular expressions. (Please note that these are just grammars for regular expressions. We will see the actual applications latter.)

Pattern Description
Escaping
\ Escapes special characters to literal and literal characters to special.

E.g: /\(s\)/ matches '(s)' while /(\s)/ matches any whitespace and captures the match.
Quantifiers
{n}, {n,}, {n,m}, *, +, ? Quantifiers match the preceding subpattern a certain number of characters. The subpattern can be a single character, an escape sequence, a pattern enclosed by parentheses or a character set.

{n} matches exactly n times.
{n,} matches n or more times.
{n,m} matches n to m times.
* is short for {0,}. Matches zero or more times.
+ is short for {1,}. Matches one or more times.
? is short for {0,1}. Matches zero or one time.

E.g: /o{1,3}/ matches 'oo' in "tooth" and 'o' in "nose".
Pattern delimiters
(pattern), (?:pattern) Matches entire contained pattern.

(pattern) captures match.
(?:pattern) doesn't capture match

E.g: /(d).\1/ matches and captures 'dad' in "abcdadef" while /(?:.d){2}/ matches but doesn't capture 'cdad'.

Note: (?:pattern) is very badly supported in older browsers.
Lookaheads
(?=pattern), (?!pattern) A lookahead matches only if the preceeding subexpression is followed by the pattern, but the pattern is not part of the match. The subexpression is the part of the regular expression which will be matched.

(?=pattern) matches only if there is a following pattern in input.
(?!pattern) matches only if there is not a following pattern in input.

E.g: /Win(?=98)/ matches 'Win' only if 'Win' is followed by '98'.

Note: Support for lookaheads is lacking in most but the newest browsers.
Alternation
Alternation matches content on either side of the alternation character.

E.g: /(ab)a/ matches 'aa' in "dseaas" and 'ba' in "acbab".
Character sets
[characters], [^characters] Matches any of the contained characters. A range of characters may be defined by using a hyphen.

[characters] matches any of the contained characters.
[^characters] negates the character set and matches all but the contained characters

E.g: /[abcd]/ matches any of the characters 'a', 'b', 'c', 'd' and may be abbreviated to /[a-d]/. Ranges must be in ascending order, otherwise they will throw an error. (E.g: /[d-a]/ will throw an error.)
/[^0-9]/ matches all characters but digits.

Note: Most special characters are automatically escaped to their literal meaning in character sets.
Special characters
^, $, ., ? and all the highlighted characters above in the table. Special characters mean characters that match something else than what they appear as.

^ matches beginning of input (or new line with m flag).
$ matches end of input (or end of line with m flag).
. matches any character except a newline.
? directly following a quantifier makes the quantifier non-greedy (makes it match minimum instead of maximum of the interval defined).

E.g: /(.)*?/ matches nothing or '' in all strings.

Note: Non-greedy matches are not supported in older browsers such as Netscape Navigator 4 or Microsoft Internet Explorer 5.0.
Literal characters
All characters except those with special meaning. Mapped directly to the corresponding character.

E.g: /a/ matches 'a' in "Any ancestor".
Backreferences
\n Backreferences are references to the same thing as a previously captured match. n is a positive nonzero integer telling the browser which captured match to reference to.

/(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.
/<(\S+).*>(.*)<\/\1>/ matches any tag.

E.g: /<(\S+).*>(.*)<\/\1>/ matches '<div id="me">text</div>' in "text<div id=\"me\">text</div>text".
Character Escapes
\f, \r, \n, \t, \v, \0, [\b], \s, \S, \w, \W, \d, \D, \b, \B, \cX, \xhh, \uhhhh \f matches form-feed.
\r matches carrriage return.
\n matches linefeed.
\t matches horizontal tab.
\v matches vertical tab.
\0 matches NUL character.
[\b] matches backspace.
\s matches whitespace (short for [\f\n\r\t\v\u00A0\u2028\u2029]).
\S matches anything but a whitespace (short for [^\f\n\r\t\v\u00A0\u2028\u2029]).
\w matches any alphanumerical character (word characters) including underscore (short for [a-zA-Z0-9_]).
\W matches any non-word characters (short for [^a-zA-Z0-9_]).
\d matches any digit (short for [0-9]).
\D matches any non-digit (short for [^0-9]).
\b matches a word boundary (the position between a word and a space).
\B matches a non-word boundary (short for [^\b]).
\cX matches a control character. E.g: \cm matches control-M.
\xhh matches the character with two characters of hexadecimal code hh.
\uhhhh matches the Unicode character with four characters of hexadecimal code hhhh.

Regular expressions can be used with javascript, or in our case, .NET. Since regular expression does nothing more than describing certain patterns, it has to be used with available functions in order to be put to practices. Such functions can perform tasks on strings, replacing a pattern with another, or detect certain patterns to ensure correct input format...etc. See below for a list of available functions.

Description Example
RegExp.exec(string)
Applies the RegExp to the given string, and returns the match information. var match = /s(amp)le/i.exec("Sample text")

match then contains ["Sample","amp"]
RegExp.test(string)
Tests if the given string matches the Regexp, and returns true if matching, false if not. var match = /sample/.test("Sample text")

match then contains false
String.match(pattern)
Matches given string with the RegExp. With g flag returns an array containing the matches, without g flag returns just the first match or if no match is found returns null. var str = "Watch out for the rock!".match(/r?or?/g)

str then contains ["o","or","ro"]
String.search(pattern)
Matches RegExp with string and returns the index of the beginning of the match if found, -1 if not. var ndx = "Watch out for the rock!".search(/for/)

ndx then contains 10
String.replace(pattern,string)
Replaces matches with the given string, and returns the edited string. var str = "Liorean said: My name is Liorean!".replace(/Liorean/g,'Big Fat Dork')

str then contains "Big Fat Dork said: My name is Big Fat Dork!"
String.split(pattern)
Cuts a string into an array, making cuts at matches. var str = "I am confused".split(/\s/g)

str then contains ["I","am","confused"]

Here is an actual example using regular expression. In this example, we try to retrieve all 6-digit employee #'s inside parenthesis, exclude parenthesis and saperate employee #'s by comma. Therefore we try to match the string with a a pattern containing anything but series of 6-digit numbers inside parenthesis and commas, and delete everything else except for those matching the pattern.

A sample input will be like: 三趨一(526274),三趨二(526275),三趨三(526279)

The regular expression is: ([^0-9,])*(?=[0-9]{6})\)

We first specify a 6-digit number by [0-9]{6}; [0-9], as we say before, matches a digit, while {6} means matching of the pattern 6 times. (?=[0-9]{6}) indicates a pattern matches only when it's followed by a 6-digit number. ([^0-9,])* represents any number of characters excluding numeric digits and commas. These two combined, it means any number of characters excluding numeric digits and commas followed by a 6-digit number. \) matches a single right parenthesis, and means the "or" operation. Hence the whole regular expression means any number of characters excluding numeric digits and commas followed by a 6-digit number, or a single right parenthesis.

The idea to delete everything matching the pattern, which will leave us only 6-digit numbers and commas, is to replace the pattern with empty strings.

The complete code looks like:

Regex.Replace("三趨一(526274),三趨二(526275),三趨三(526279)", @"([^0-9,])*(?=[0-9]{6})\)", "");

For those who are interested in regular expression with javascript, check out the following websites.

http://www.evolt.org/article/Regular_Expressions_in_JavaScript/17/36435/

http://www.sitepoint.com/print/expressions-javascript