call us toll-free +1 855 855 3600
 
  • Home
  • >
  • Blog
  • >
  • So You Think You Can Search - Comparing Microsoft SQL Server FTS and Apache Lucene
DB Best Chronicles
 
Talks on Big Data, Mobile Apps, Web and Software Development

So You Think You Can Search – Comparing Microsoft SQL Server FTS and Apache Lucene

Posted by | On February 8th, 2013 | In Big Data, Web & Software Development -->
 

Prologue

In 1999, when I was in my first year of university, if I wanted to check my e-mail I had to come into a lab. I used to come into the class, open browser, type address and then go out to grab some coffee. Usually when I came back I was really happy to see that at least half of the page had been loaded.

Today people are not so patient. They used to get response from the web-sites at least in a few seconds and if your application is not that fast, you have a problem. In fact, “few seconds” is a very optimistic term. Nowadays we measure response time with milliseconds.

If you want to create something really popular in the Web (and your application operates big amount of data), sooner or later you will face the need to implement full text search engine over your system.

My Goals

In this post I will try to describe and compare two technologies – Microsoft SQL Server Full Text Search and Apache Lucene. The first one is an embedded SQL Server feature and the second one is a third-party software library, originally developed in Java and ported to many other platforms, including .NET.

lucene vs sql server fts

How to build a super fast search for your web application.

Introducing SQL Server Full Text Search

Microsoft SQL Server supports full text search since 1998, when version 7.0 was released. The best thing about it is that it is embedded part of SQL Server, which provides you all benefits of full integration with RDBMS.

So what are the benefits of using full text search over “like” operator? The main difference is in underlying index structure. Regular database index is built on the whole field value, full text search technique uses inverted index instead.  FTS involves indexing the individual words within a text field in order to make searching through many records quick (tokenization). Using “like” still requires a string search within the field. So specific index organization provides such benefits as “precision vs recall tradeoff”, high search performance, stemming, ranking and many others typical FTS features. You will have to create this index before you can start searching, and after that SQL Server will handle updating the index automatically.

How Lucene Search Works

Lucene Search also use inverted index, but since it is just a software library, you will have to manage index by yourself. This and other major differences between two technologies are shown in the table below:

Comparing MS SQL Full Text Search and Lucene

Lucene MS SQL FTS
Index auto update No Yes
Store data in index Yes No
Location in RAM Yes No
Interface API SQL
Queering multiple columns Yes Yes
Stop words, synonyms, sounds-like Yes Yes
Custom Index Documents Structure Yes No
Wildcards Yes With restrictions
Spellchecking, hit-highlighting and other extensions Provided in “contrib” extensions library No

So, MS FTS will handle updating the index. On the other hand, Lucene provides the opportunity to put it into a RAM and also allows to store data right in the index, which obviously increases the performance. MS FTS is a part of RDBMS so you will have to write SQL-queries (using specific FTS statements like CONTAINS, FREETEXT etc) and Lucene provides you an API.

Both of them support stop words, synonyms, stemming and multiple columns queering. Lucene allows you to use more query operators than FTS and also supports wildcards in the beginning and middle of the words, when FTS allows it only in the end. With Lucene you can choose any custom structure for your index (analyze and store data from different sources in one document); FTS allows you to create index only on one table (but you can use indexed views as a workaround).

Another great thing about Lucene is that many useful extensions like spellchecking or hits highlighting are already implemented and ready to use. But in case with SQL Server FTS you will have to create it by yourself (using CLR or client application code).

Getting to the Specifics

Some specific functionality you will have to implement manually using both Lucene and FTS. But there will be a difference in the approach and eventual performance. Let’s see it on a small example.

We took Wikipedia dump for our tests. Let’s assume, that we want to display most relevant articles that were recently modified in the top of search results.

SQL-statement for FTS will be look like that:

1
2
3
4
5
6
7
8
9
declare @today datetime2 = getdate(), @maxdaysago float = 100. , @multiplier float = 15.

select top 20 p.title, p.text

from CONTAINSTABLE(dbo.WikiArticles, text, 'cats') ft

join dbo.WikiArticles p  on ft.[KEY] = p.id

order by case when datediff(dd, p.date,  @today) > @maxdaysago then ft.rank else ft.rank* (1 + @multiplier * (( @maxdaysago - datediff(dd, p.date,  @today)) / @maxdaysago)) end desc

We have to join main table with CONTAINSTABLE results to get dates and then add to query complex “order by” clause, which makes it pretty hard for the SQL Server engine.

And here is the source code for Lucene:

(You can find formatted code in Appendix I in the bottom of the post)

We’ve created two new classes (derived from Lucene standard classes) and override just one standard method. Than we can use it in search queries like that:

1
2
3
4
5
6
7
8
IndexReader reader = IndexReader.Open(IndexDirectory, true);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.SetDefaultFieldSortScoring(true, true);
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
Query query = parser.Parse("cats");
Query boostQuery = new RecencyBoostingQuery(query, 15, 100, "date");
Sort sort = new Sort(new SortField[]{SortField.FIELD_SCORE, new SortField("title", SortField.STRING)});
TopDocs hits = searcher.Search(boostQuery, null, 20, sort);

The only difference from a standard Lucene search in the listing above is using RecencyBoostingQuery with required boosting parameters instead of standard Query class.

We’ve tested these queries for SQL Server and Lucene on the same environment (Intel i5-3330 @ 3.0 GHz RAM: 8GB) and it appears that Lucene handles it at least 5 times faster. The results of this and other performance tests are shown in the tables below:

Indexing speed, size and single query execution time

Lucene MS SQL FTS
Indexing Speed 3 MB/sec 1 MB/sec
Index Size 10-25% 25-30%
Simple query <20 ms < 20 ms
Query With Custom Score < 4 sec >20 sec

Parallel Query Executions (10 threads, average execution time per query in ms)

MS SQL FTS Lucene (File System) Lucene (RAM)
Cold System Simple Query 56 643 21
Boost Query 19669* 859 27
Second executions Simple Query 14 8 <5
Boost Query 465 17 9

*average time, the very first query could be executed up to 2 min(!)

As you can see, SQL Server is a little faster with simple queries executed first time, and is way behind in case of complex query with custom scoring criteria. And Lucene with index placed in RAM is unbeatable favorite.

Conclusion

So if your performance requirements are critical and you have enough RAM to store the whole index, you probably have to choose Lucene as your search engine. Same choice will be reasonable when you are going to build a complex system with a lot of custom functionality like spellchecking, hit-highlighting, auto complete, advanced scoring etc. And if your search engine requirements are not so strong, maybe using SQL Server Full Text Search will be the best option – it is very simple to maintain and fast enough with simple queries.

Apache Lucene

Appendix I

Recency Boosting listing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public class RecencyBoostingQuery : CustomScoreQuery
{
public double Multipier { get; set; }
public int Today { get; set; }
public int MaxDaysAgo { get; set; }
public string DayField { get; set; }
public static int MsecPerDay = 1000*3600*24;

public RecencyBoostingQuery(Query subQuery, double multipier, int maxDaysAgo, string dayField)
: base(subQuery)
{
Today = (int) ((DateTime.Now.Ticks/TimeSpan.TicksPerMillisecond)/(MsecPerDay));
Multipier = multipier;
MaxDaysAgo = maxDaysAgo;
DayField = dayField;
}

protected override CustomScoreProvider GetCustomScoreProvider(IndexReader reader)
{
return new RecencyBooster(reader, this);
}
}

public class RecencyBooster : CustomScoreProvider
{
private readonly int[] _publishDay;
private readonly RecencyBoostingQuery _recencyBoostingQuery;

public RecencyBooster(IndexReader reader, RecencyBoostingQuery recencyBoostingQuery)
: base(reader)
{
_recencyBoostingQuery = recencyBoostingQuery;
_publishDay = FieldCache_Fields.DEFAULT.GetInts(reader, _recencyBoostingQuery.DayField);
}

public override float CustomScore(int doc, float subQueryScore, float valSrcScore)
{
int daysAgo = (_recencyBoostingQuery.Today - _publishDay[doc]);

if (daysAgo &lt; _recencyBoostingQuery.MaxDaysAgo)
{
var boost =
(float)
(_recencyBoostingQuery.Multipier*(_recencyBoostingQuery.MaxDaysAgo - daysAgo)/
_recencyBoostingQuery.MaxDaysAgo);

return (float) (subQueryScore*(1.0 + boost));
}

return subQueryScore;
}

 

SAP HANA – a New Database Management System

Posted by | On October 8th, 2012 | In Big Data, Database Migration -->
 

In 2007 I worked for a company which provided data warehousing solutions. Our biggest client database was about 2-3 TB which was not so much even five years ago, but because of extremely complicated ETL and complex reports we had a lot of troubles.

We spent hundreds of hours monthly on code optimization, but every 30 minutes saved on ETL and processing were compensated with monthly data growth and we had to start over and over.

Finally it comes the time when we couldn’t improve anything else in programming code and company decided to drastically upgrade the hardware. But extremely expensive servers and disk storages gives us only several percent performance improvement. We thought about migration to another platform but preliminary estimations showed that every traditional RDBMS has the same bottleneck – disk storage. Maybe, if it was happening today, we probably would turn to fundamentally new technologies.

The whole software development industry is highly innovation-oriented, but RDMBSs always are perceived as something very conservative. I still have record about ANSI-92 SQL knowledge in my CV and almost all fundamental works from 70s are up to date, but in spite of that data management software industry is very fast developing and great new products comes to market each year.

Our blog already contains articles written by Dale Anderson about NoSQL and column-oriented databases, now it’s time to talk about in-memory databases.

IMDB – It’s Not About Movies

In-memory database (IMDB) is a database management system that stores data entirely in main memory. It is contrasted with database management systems which employ a disk storage mechanism.

The idea is not so fresh. Actually, the first IMDB was created in 1991 (to they are even older than ANSI-92 from my CV), but in twenty years something has changed.

Comparing with 1991 RAM become 10000 cheaper and way faster. On this evidence, in-memory databases systems (which for a long time were used primary in RTOSs and other specific embedded systems) now can serve as a database for e-commerce, social networking and of course business intelligence solutions.  You can keep terabytes of data in main memory – it’s a low end of Big Data scale.

So, now IMDBs still don’t have a big market share, but last year’s industry’s major companies pay more attention to this technology. One of them, SAP AG, recently acquired Sybase, in 2010 released in-memory database called HANA, which is now in front and center of SAP’s agenda according to several industry experts. And it is the main subject of my post.

What Problems Does SAP HANA Solve?

What is SAP HANA, Anyway?

First of all, it is an in-memory database, with all corresponding benefits and disadvantages. As for benefits, obviously, working with all the data in memory is way faster than writing to and reading from file system (in certain circumstances – 100x or even 1000x times). Well, regular memory-mapped files are very fast too, but in-memory database is still a database. HANA is full ACID-compliant, supports multi-user access, high level data definition language, several programming interfaces, SQL and MDX, stored procedures and triggers, and more other traditional RDBMs tools and mechanisms.

New possibilities with SAP HANA

The common problem with ACID for most IMDBs is “D”. Main memory is volatile storage and loses its content when it is out of power. To avoid this HANA has Persistence Level component. Each committed transaction generates a log entry that is written to non-volatile storage. HANA stores changed pages in save points, which are asynchronously written to persistent storage in regular intervals (by default every 5 minutes). Persistence level is also used for whole database backup and restore.

I am not going to describe here whole IMDBs architecture and common principles, so let’s focus on specific HANA features: 

  • HANA comes shipped as a pre-configured appliance from hardware vendor and the license is bought from SAP.
  • HANA is positioned as single system for both OLTP and OLAP purpose and supports both columnar and row-based storage organization, so you can use benefits of each approach simultaneously. HANA even allows joining row-based tables with column-based tables. You can also alter an existing table from row-base to columnar and vice versa.
  • HANA supports parallel execution.
  • HANA is fully integrated with SAP business objects and other client tools like MS Excel, Dashboard Design Tool etc can also access HANA directly.
  • HANA supports ODBC, JDBC and ODBO drivers.
  • HANA has its own scripting language named SQLScript that is designed to enable optimizations and parallelization and is a collection of data, functional and procedural extensions.

I can continue the list, but maybe it would be better just to give you some useful links:

SAP provides 30 days trial access to a hosted HANA test and evaluation environment. Access is provided through remote desktops (by default you are offered to connect directly from web-browser but it’s much more convenient to use traditional way), so you don’t have to install anything on your desktop. You can also read a great blog about it before you start.

I am using this trial access for about a week. I’ve tried to reproduce some problems we have to solve with our client database from the beginning of my post and the results sometimes were very impressive. But of course it’s too early to make conclusions.

How to Implement a SQL Server View Based on Temporary Table

Posted by | On August 13th, 2012 | In Big Data, Database Migration -->
 

If you ask any experienced SQL Server developer: “Can I build a view on temporary table?” the answer will be “Of course not. And why on Earth you need it?” Well, Oracle developers can tell you why. Temporary tables are very useful when you have to separate data between users/sessions and creating view on them is just convenient way to store some logic. SQL Server developers used to deal without it but when they are trying to move database from Oracle they will have to find workaround because it’s pretty common pattern in Oracle.

Let’s take a look at how it works.

SQL Server View

First thing that comes to mind is to use regular table instead of temporary. To emulate data separation between sessions we can add “spid” column with default value @@spid (system function that returns the session ID of the current user process). Than we need to create view named exactly like original temporary table (of course we have to name our regular table differently) and add where condition “spid = @@spid”. So select from the view will return only that part of data, which was populated in current connection. Next challenge here is that in Oracle data from temporary table is automatically deleted on rollback/commit.

If we are working with regular SQL Server table we have to do it manually. In the best case it will be one entry point in original application code where we can simply add something like “delete temp_mock where spid = @@spid”, otherwise it could be difficult to find all references to the table and add delete statements there. But let’s assume that our original application architecture was good enough and purging table is not a challenge. What could be a challenge is concurrent access to data. Multiple queries on a same table could – and very likely will – cause problems with locks and deadlocks, which of course are solvable, but this is a topic for separate discussion. So main three steps of the algorithm are:

  1. Create table and view (very simple)
  2. Add “delete” statement to all required routines (could be a challenge)
  3. Fix locks/deadlocks issues (definitely would be a challenge)

In many cases this algorithm will be good enough but what if we have to migrate several hundreds of temporary tables and we want to make the process automated? CLR could be a good solution.

Let’s assume that we have to migrate simple temporary table #Orders with just three columns “ID”, “CustomerID” and “Date”, and a view build on that table:

1
2
3
4
5
6
7
8
create view v_OrdersByDate
as
select
c.name,
o.date
from #Orders o
join Customer c
on c.ID = o.CustomerID

If you try to create such view in SQL Server query will failed with error:

Views or functions are not allowed on temporary tables. Table names that begin with ‘#’ denote temporary tables.

All you need is just to wrap temporary table with CLR table-valued function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
public class Record
{
public SqlInt32 Id;
public SqlInt32 CustomerId;
public SqlDateTime Date;

public Record(SqlInt32 id, SqlInt32 customerId, SqlDateTime date)
{
Id = id;
CustomerId = customerId;
Date = date;
}
}
public static void FillRow(object obj, out SqlInt32 id, out SqlInt32 customerId, out SqlDateTime date)
{
var record = (Record)obj;
id = record.Id;
customerId = record.CustomerId;
date = record.Date;
}
[SqlFunction(DataAccess = DataAccessKind.Read, SystemDataAccess = SystemDataAccessKind.Read, FillRowMethodName= "FillRow", TableDefinition = "id int, CustomerID int, [Date] datetime")]
public static IEnumerable GetTempOrdersCLR()
{
var result = new ArrayList();

using (var connect = new SqlConnection("context connection=true"))
{
connect.Open();
using (var cmd = new SqlCommand("select id, customerID, [Date] from #Orders", connect))
{
using (var reader = cmd.ExecuteReader())
{
while (reader.Read())
result.Add(new Record(reader.GetSqlInt32(0), reader.GetSqlInt32(1), reader.GetSqlDateTime(2)));
}
}
}
return result;
}

And after that you can create view on this CLR function instead of temporary table:

1
2
3
4
5
6
7
8
create view v_OrdersByDate
as
select
c.name,
o.date
from [dbo].[GetTempOrdersCLR]() o
join Customer c
on c.ID = o.CustomerID

It will work.

Not it may seem that CLR is a silver bullet for dealing with almost all SQL Server limitations. Of course it is not.

CLR require some communication overhead. Performance tests show that CLR is way slower than direct select from temporary table:]

Records in #Orders GetTempOrdersCLR
(sec)
#Orders
(sec) 
1,000 0.020 0.003
10,000 0.050 0.003
100,000 0.500 0.020
1,000,000 5.000 0.100

These numbers does not mean that we can’t use CLR wrappers for temporary tables. First of all, usually these tables don’t contain millions of records. Second, we can try to optimize it. For example, we can add key parameter and use cross apply instead of join, if the amount of data in the left and right tables is very different. Here are results for 1 000 000 records in “#Orders” and 50 000 records in “Customer”.

A join #tab 0.100 sec
B join dbo. GetTempOrdersCLR () 5.000 sec
C cross apply dbo. GetTempOrdersCLR (c.CustomerId) 1.000 sec

 

So CLR wrapper is safe, fast to implement and could be very simply automated. Regular table algorithm is faster, but sometimes very complicated.

 

How to Create Autonomous Transactions in SQL Server

Posted by | On August 13th, 2012 | In Big Data, Database Migration -->
 

You probably heard about autonomous transactions in Oracle. In a few words, using just one pragma directive in a PL/SQL block you can isolate it from the callers’ context, so it becomes independent transaction. It’s quite useful if you want organize data audit and is pretty widely used in production databases. As you might guess these autonomous transactions do not have direct equivalent in Microsoft SQL Server. The only way to isolate a Transact-SQL block from a transaction context is to open a new connection. There are several options here, so let’s start with SSMA approach.

Autonomous Transactions in SQL Server

Assume that we have some logging logic in a stored procedure named “dbo.LogError” and we want to commit it even in case when main transaction will be roll backed.

As in previous case with UDF exceptions SSMA use extended procedure (now – to open a new connection). SSMA will wrap our routine like that:

1
2
3
4
5
6
7
EXECUTE master.dbo.xp_ora2ms_exec2_ex
@active_spid,
@login_time,
N'TEST_AT',
N'DBO',
N'LogError$IMPL',
N'false'

It was the only way to do that in SQL Server 2000 and it still works faster than other solutions, but there is a one big issue here – Microsoft announced that this feature is officially deprecated, so extended stored procedure is just not an option now.

Starting from SQL Server 2005 we can use CLR instead and in SQL Server 2008 version loopback linked server is appropriate too. You can read more about loopback approach in this article.

It’s pretty good primarily because it’s very simple. Once you have configured loopback server all you need is just change invocation T-SQL code from

1
exec dbo.LogError

to

1
exec loopback.DatabaseName.dbo.LogError

without any changes in target routine. But you have to keep in mind that using linked server in SQL Server always negatively affects performance. Calling routine throw the linked server is about 3-4x times slower than same invocation in current context. Well, it’s actually 1-2 milliseconds overhead per execution (or even less), but still, if you looking for a solution for very high loaded system, maybe you need something else.

SQLCLR might be good enough (this approach is recommend by Microsoft by the way). It still will affect performance, but a little less (up to 2-2.5x times slower). And if there are some computational tasks in your logic, CLR can provide additional performance here and maybe compensate calling overhead. Of course, writing CLR routines requires knowledge of C# or other .NET language. But if we just want to execute our stored procedure in new connection using CLR, .NET code will be very simple. If you are new to SQLCLR, you can start with this article.

Code of CLR wrapper for our “LogError” stored procedure will look like that:

1
2
3
4
5
6
7
8
9
10
[Microsoft.SqlServer.Server.SqlProcedure]
public static void ErrorLoggingCLR()
{
using(SqlConnection connection = new SqlConnection("context connection=true"))
{
connection.Open();
SqlCommand command = new SqlCommand("exec dbo.LogError", connection);
command.ExecuteNonQuery();
}
}

After you deploy this to SQL Server, you can call it just like usual SQL Server stored procedure. Just make sure that your server is configured properly (sp_configure ‘clr enabled’ returns “1” as running value).

How to Raise an Exception in SQL Server User Defined Functions

Posted by | On July 2nd, 2012 | In Database Migration -->
 

There are a lot of differences between Oracle and MS SQL Server and you will face many of them trying to move your database from one platform to another. If you use SSMA, which is a good thing to do, you can avoid huge amount of manual work, but you will have to solve some specific problems by yourself after SSMA job will be done. One of them can be raising errors from UDF, which you can easily do in Oracle and can’t in SQL Server without some workarounds because of T-SQL limitations for UDF.

SQL Server User Defined Functions

So let’s create some pretty simple UDF in Oracle and take a look on how we can keep its full functionality in MS SQL.

Oracle:

1
2
3
4
5
6
7
8
CREATE OR REPLACE FUNCTION DIVIDE (a int, b int)
RETURN NUMBER IS
BEGIN
IF (b=0) THEN
RAISE_APPLICATION_ERROR(-20002, 'You cannot divide by zero!');
END IF;
RETURN a/b;
END DIVIDE;

 

If you try to rewrite it in T-SQL just with obvious syntax and data types changes like that:

1
2
3
4
5
6
7
CREATE FUNCTION DIVIDE (@a int, @b int)
RETURNS float(53)
BEGIN
IF (@b = 0)
RAISERROR(59998, 16, 1, 'You cannot divide by zero!')
RETURN CAST(@a AS float(53)) / @b
END

you will get an error:

Msg 443, Level 16, State 14, Procedure DIVIDE, Line 5
Invalid use of a side-effecting operator ‘RAISERROR’ within a function.

Let’s see what SSMA does to avoid this (and many other) T-SQL restrictions for UDF. If you are new to SSMA you can start with this presentation by our CEO Dmitry Balin, which describes the main idea of migration process. It also should be enough to make first steps with SSMA.

So, first of all SSMA creates “sysdb” database with collection of auxiliary objects (like tables, SPs, UDFs etc). Part of them is just emulating Oracle build-in functionality and the other helps to deal with more complicated situations.

You can’t raise errors from UDF, but you can do that from a stored procedure, so SSMA creates one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
CREATE PROCEDURE [dbo].[DIVIDE$IMPL]
@a int,
@b int,
/*
* SSMA warning messages:
* O2SS0356: Conversion from NUMBER datatype can cause data loss.
*/

@return_value_argument float(53) OUTPUT
AS
/*Generated by SQL Server Migration Assistant for Oracle version 5.2.1259.*/
BEGIN

IF (@b = 0)
BEGIN

DECLARE
@db_raise_application_error_message nvarchar(4000)

SET @db_raise_application_error_message = N'ORA' + CAST(-20002 AS nvarchar) + N': ' + N'You cannot divide by zero!'

RAISERROR(59998, 16, 1, @db_raise_application_error_message)

END

SET @return_value_argument = CAST(@a AS float(53)) / @b

RETURN /*EXCEPTION*/

END

It contains all the logic from the original UDF and next problem here is that you can’t invoke stored procedure from UDF. But you can execute an extended stored procedure and call regular stored procedure from it. “Sysdb” includes “xp_ora2ms_exec2_ex” extended stored procedure which is just a wrapper for calling regular stored procedures from UDF (you can read about this and other migration troubleshooting in SSMA White Paper).

So, the whole thing here is that we create stored procedure where the logic is implemented, wrap it with extended stored procedure and call the last from UDF. It looks like obvious overhead, causes performance issues and… doesn’t work. You can’t receive exception from extended SP and if you run the following query

1
select [dbo].[DIVIDE] (1,0)

it will return null.

This trick with extended procedure is good when you need to emulate other Oracle functionality which is forbidden in T-SQL, but it doesn’t work if you just need to raise error from UDF. So what is the solution?

The most known trick here is to force UDF fail with system exception:

1
2
3
4
5
6
7
ALTER FUNCTION DIVIDE (@a int, @b int)
RETURNS float(53)
BEGIN
IF (@b = 0)
return cast('You cannot divide by zero!' as int)
RETURN CAST(@a AS float(53)) / @b
END

 

It looks much less complicated than previous solution and it really works! Now if you run

1
select [dbo].[DIVIDE] (1,0)

 

again you will get result message like bellow:

Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value ‘You cannot divide by zero!’ to data type int.

Well, it doesn’t look perfect, but it’s definitely better than nothing. But what if for some reasons you have to specify error severity and state? There is another much more refined way to accomplish that. It’s similar to the first solution but now we will use OPENQUERY instead of extended procedure.

First of all, let’s create loopback linked server:

1
2
3
4
5
EXEC sp_addlinkedserver @server = N'loopback',
@srvproduct = N' ',
@provider = N'SQLNCLI',
@datasrc = N'Your server name',
@catalog = N'master'

 

After that we need a stored procedure (in difference from the first solution you have to create only one SP for raising errors from all your UDFs):

1
2
3
4
5
6
7
8
create proc raise_error
@errorstr varchar(max),
@errorsvr int = 11,
@errorstate int = 1
as
set nocount on
raiserror(@errorstr,@errorsvr,@errorstate);
select null as result;

 

And finally here is our “divide” function:

1
2
3
4
5
6
7
8
ALTER FUNCTION DIVIDE (@a int, @b int)
RETURNS float(53)
BEGIN
declare @fake_for_error int
IF (@b = 0)
select @fake_for_error = 0 from openquery(loopback, 'EXEC [Your Database Name].dbo.raise_error @errorstr = ''You cannot divide by zero'', @errorsvr = 18, @errorstate = 3')
RETURN CAST(@a AS float(53)) / @b
END

 

And if you run select again, you will get clear error message with custom severity and state values:

Msg 50000, Level 11, State 3, Line 1
You cannot divide by zero

But what about performance? Well, calling stored procedure through the linked server is a little overhead and if performance is critical you should use “cast message to int” trick instead. But in most cases query time will increase almost insensibly, so if you want to get a clean error message with custom state and severity levels you can use the loopback linked server version.