T-SQL: Using PATINDEX to Extract Digits from the Middle of a String

Suppose you have a string that starts and ends in non-digits and has a series of digits in the middle e.g ‘A$B1234C*DE&’. The SQL below extracts the digits ‘1234’.

DECLARE @TestString VARCHAR(30) =  'A$B1234C*DE&'; -- non-digits & digits & non-digits
 
SELECT
      @TestString AS 'Original_string'
      , PATINDEX('%[0-9]%',@TestString) AS 'Index_first_digit'
      , LEN(@TestString) - PATINDEX('%[0-9]%',REVERSE(@TestString)) + 1 AS 'Index_last_digit'
      , LEN(@TestString) - PATINDEX('%[0-9]%',@TestString) - PATINDEX('%[0-9]%',REVERSE(@TestString)) + 2 AS 'Number_of_digits'
      , SUBSTRING(@TestString
                        , PATINDEX('%[0-9]%',@TestString)
                        , LEN(@TestString) - PATINDEX('%[0-9]%',@TestString) - PATINDEX('%[0-9]%',REVERSE(@TestString)) + 2) AS 'Digits_only'

Output:

Original_string Index_first_digit Index_last_digit Number_of_digits Digits_only
A$B1234C*DE& 4 7 4 1234

T-SQL: Count Number of Times a Particular Character or Substring occurs in a String

To count the number of times a particular substring (e.g. ‘Two’) occurs in a string (e.g. ‘One, Two, One, Two’), follow these steps:

1. Replace each occurrence of the substring by an empty string e.g.

SELECT REPLACE('One, Two, One, Two','Two','')

2. Calculate the length of the new string (12 characters) e.g.

SELECT LEN(REPLACE('One, Two, One, Two','Two',''))

3. Subtract this from the length of the original string (18 characters) e.g.

SELECT LEN('One, Two, One, Two') - LEN(REPLACE('One, Two, One, Two','Two',''))

4. Divide the result (18 – 12 = 6) by the number of characters in the substring (6/3 = 2 occurrences) e.g.

SELECT (LEN('One, Two, One, Two') - LEN(REPLACE('One, Two, One, Two','Two','')))/LEN('Two')

The count can be made case sensitive by applying the appropriate collation e.g.

SELECT (LEN('One, Two, One, Two') - LEN(REPLACE('One, Two, One, Two' COLLATE Latin1_General_CS_AS,'Two','')))/LEN('Two')

In Excel we can use a very similar formula e.g. if our string “One, Two, One, Two” is in cell A1:

=(LEN(A1)-LEN(SUBSTITUTE(A1,"Two","")))/LEN("Two")

T-SQL: BULK INSERT of Files with only Line Feed as the Row Terminator

Windows uses a carriage return ‘\r’ and a line feed character ‘\n’ to terminate lines. However, UNIX uses only a line feed character.

If you try to import a UNIX output file into a SQL Server database table using the BULK INSERT statement with ‘\n’ as the row terminator, you will encounter the following error:

Msg 4866, Level 16, State 1, Line 2
The bulk load failed. The column is too long in the data file for row 1, column 2. Verify that the field terminator and row terminator are specified correctly.

As explained on technet.microsoft.com the reason is that the ‘\n’ in the BULK INSERT statement is automatically replaced with ‘\r\n’. There are two solutions:

SOLUTION 1: Use the hexadecimal value ‘0x0a’ for a line feed character as the row terminator as explained on stackoverflow.com e.g.

BULK INSERT tablename
FROM 'unixoutputfile.txt'
WITH
(
    FIRSTROW = 1
    , FIELDTERMINATOR = '\t'
    , ROWTERMINATOR = '0x0a'
);

SOLUTION 2: As explained on technet.microsoft.com the CHAR(10) character can also be used to specify only a line feed character. However, in this case the BULK INSERT statement must be written as a string and then passed to the EXEC statement e.g.

DECLARE @bulk_cmd varchar(1000);
SET @bulk_cmd = 'BULK INSERT tablename
FROM ''unixoutputfile.txt'' 
WITH (ROWTERMINATOR = '''+CHAR(10)+''')';
EXEC(@bulk_cmd);

T-SQL: Field Terminator of a BULK INSERT occurs within a Data Field

Suppose you want to import data from an Excel worksheet into a SQL Server database table using the BULK INSERT statement. The first step would be to save the worksheet as a CSV file. You could then run the BULK INSERT statement with

FIELDTERMINATOR = ','

and

ROWTERMINATOR = '\n'

However, if the field terminator (a comma) occurs within any data field, you’ll get the error

Msg 4864, Level 16, State 1, Line 4
Bulk load data conversion error (type mismatch or invalid character for the specified codepage)…

One solution is to re-save the Excel worksheet as a CSV file but with a list separator that does not occur within any of the data fields. To change the list separator that Excel uses, follow these steps

  1. Open the Control Panel
  2. Navigate to Region and Language
  3. Under Formats, Additional Settings, you should find List Separator
  4. Change the List Separator to something that does not occur within any of the data fields

ListSeparatorOption

T-SQL: Converting a String to a Date

The SQL query

SELECT CONVERT(DATE, '31/12/2013')

fails in SQL Server with the error message

Msg 241, Level 16, State 1, Line 1
Conversion failed when converting date and/or time from character string.

To fix this we need to specify the format the string is in. We do this by passing the appropriate integer value as an argument to the style parameter e.g.

SELECT CONVERT(DATE, '31/12/2013', 103)

Refer to CAST and CONVERT (Transact-SQL) for a full list of style codes.