Wednesday, 22 February 2012

Deleting Duplicate Records Using Common Table Expression (CTE) Feature of SQL Server 2005


Introduction

Deleting duplicate records from an SQL Server table can sometime become a very tedious activity especially in scenarios when you don’t have a Primary Key defined for the source table. In this article we will take advantage of Common Table Expression (CTE) feature of SQL Server 2005 to delete duplicate records from the table.

What is Common Table Expression (CTE)?

A common table expression (CTE) can be considered as a temporary result set which is defined within the execution scope of a single INSERT, UPDATE, SELECT OR DELETE statements. It is much similar to a derived table. However, a CTE can be self referenced multiple times within the scope of the same query.

Deleting Duplicate Records Using Common Table Expression (CTE) Feature of SQL Server 2005

Let us go through the below mentioned example to remove duplicates from a table which doesn’t have a Primary Key defined.
 /* Sample Script to Deleting Duplicate Records Using CTE */
USE TempDB
GO
/* Drop EmployeeDetails Table if already exists */IF OBJECT_ID (N'dbo.EmployeeDetails', N'U') IS NOT NULL
    DROP TABLE dbo.EmployeeDetails;
GO 
/* Create EmployeeDetails Details Table*/
CREATE TABLE EmployeeDetails
(
EmpID       INT IDENTITY (1,1),Name        VARCHAR(25),Age         INT)
GO
/* Insert Script is using Row Value Constructor Feature of SQL Server 2008*/

INSERT
 INTO EmployeeDetails 
VALUES 
       ('Lilly', 25)      ,('Lucy',  25)      ,('Lilly', 25)
      ,('Mary',  26)      ,('Mariam',26)      ,('Mary',  26)      ,('Lisa',  27)
GO
/* Using Common Table Expression Feature to Delete Duplicate Records */
WITH SampleCTE 
AS 
(
      SELECT ROW_NUMBER () OVER ( PARTITION BY NAME, AGE ORDER BY EmpID) AS RNUM 
      FROM EmployeeDetails 

)
DELETE FROM SampleCTE WHERE RNUM > 1
GO
/* Employee Details without Duplicates */
SELECT * FROM EmployeeDetails
GO

sql server interview question andswer


1. What are the basic functions for master, msdb, model, tempdb and resource databases?
  1. The master database holds information for all databases located on the SQL Server instance and is theglue that holds the engine together. Because SQL Server cannot start without a functioning masterdatabase, you must administer this database with care.
  2. The msdb database stores information regarding database backups, SQL Agent information, DTS packages, SQL Server jobs, and some replication information such as for log shipping.
  3. The tempdb holds temporary objects such as global and local temporary tables and stored procedures.
  4. The model is essentially a template database used in the creation of any new user database created in the instance.
  5. The resoure Database is a read-only database that contains all the system objects that are included with SQL Server. SQL Server system objects, such as sys.objects, are physically persisted in the Resource database, but they logically appear in the sys schema of every database. The Resource database does not contain user data or user metadata.
2. What is Service Broker?
Service Broker is a message-queuing technology in SQL Server that allows developers to integrate SQL Server fully into distributed applications. Service Broker is feature which provides facility to SQL Server to send an asynchronous, transactional message. it allows a database to send a message to another database without waiting for the response, so the application will continue to function if the remote database is temporarily unavailable.
3. Where SQL server user names and passwords are stored in SQL server?
They get stored in System Catalog Views sys.server_principals and sys.sql_logins.
4. What is Policy Management?
Policy Management in SQL SERVER 2008 allows you to define and enforce policies for configuring and managing SQL Server across the enterprise. Policy-Based Management is configured in SQL Server Management Studio (SSMS). Navigate to the Object Explorer and expand the Management node and the Policy Management node; you will see the Policies, Conditions, and Facets nodes.
5. What is Replication and Database Mirroring?
Database mirroring can be used with replication to provide availability for the publication database. Database mirroring involves two copies of a single database that typically reside on different computers. At any given time, only one copy of the database is currently available to clients which are known as the principal database. Updates made by clients to the principal database are applied on the other copy of the database, known as the mirror database. Mirroring involves applying the transaction log from every insertion, update, or deletion made on the principal database onto the mirror database.
6. What are Sparse Columns?
A sparse column is another tool used to reduce the amount of physical storage used in a database. They are the ordinary columns that have an optimized storage for null values. Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values.

7. What does TOP Operator Do?
The TOP operator is used to specify the number of rows to be returned by a query. The TOP operator has new addition in SQL SERVER 2008 that it accepts variables as well as literal values and can be used with INSERT, UPDATE, and DELETES statements.
8. What is CTE?
CTE is an abbreviation Common Table Expression. A Common Table Expression (CTE) is an expression that can be thought of as a temporary result set which is defined within the execution of a single SQL statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query.
9. What is MERGE Statement?
MERGE is a new feature that provides an efficient way to perform multiple DML operations. In previous versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions, but now, using MERGE statement we can include the logic of such data modifications in one statement that even checks when the data is matched then just update it and when unmatched then insert it. One of the most important advantages of MERGE statement is all the data is read and processed only once.
10. What is Filtered Index?
Filtered Index is used to index a portion of rows in a table that means it applies filter on INDEX which improves query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes. When we see an Index created with some where clause then that is actually a FILTERED INDEX.
11. Which are new data types introduced in SQL SERVER 2008?
  1. The GEOMETRY Type: The GEOMETRY data type is a system .NET common language runtime (CLR) data type in SQL Server. This type represents data in a two-dimensional Euclidean coordinate system.
  2. The GEOGRAPHY Type: The GEOGRAPHY datatype’s functions are the same as with GEOMETRY. The difference between the two is that when you specify GEOGRAPHY, you are usually specifying points in terms of latitude and longitude.
  3. New Date and Time Datatypes: SQL Server 2008 introduces four new datatypes related to date and time: DATE, TIME, DATETIMEOFFSET, and DATETIME2.
    1. DATE: The new DATE type just stores the date itself. It is based on the Gregorian calendar and handles years from 1 to 9999.
    2. TIME: The new TIME (n) type stores time with a range of 00:00:00.0000000 through 23:59:59.9999999. The precision is allowed with this type. TIME supports seconds down to 100 nanoseconds. The n in TIME (n) defines this level of fractional second precision, from 0 to 7 digits of precision.
    3. The DATETIMEOFFSET Type: DATETIMEOFFSET (n) is the time-zone-aware version of a datetime datatype. The name will appear less odd when you consider what it really is: a date + a time + a time-zone offset. The offset is based on how far behind or ahead you are from Coordinated Universal Time (UTC) time.
    4. The DATETIME2 Type: It is an extension of the datetime type in earlier versions of SQL Server. This new datatype has a date range covering dates from January 1 of year 1 through December 31 of year 9999. This is a definite improvement over the 1753 lower boundary of the datetime datatype. DATETIME2 not only includes the larger date range, but also has a timestamp and the same fractional precision that TIME type provides
12. What are the Advantages of using CTE?
  1. Using CTE improves the readability and makes maintenance of complex queries easy.
  2. The query can be divided into separate, simple, logical building blocks which can be then used to build more complex CTEs until final result set is generated.
  3. CTE can be defined in functions, stored procedures, triggers or even views.
  4. After a CTE is defined, it can be used as a Table or a View and can SELECT, INSERT, UPDATE or DELETE Data.
13. What is CLR?
In SQL Server 2008, SQL Server objects such as user-defined functions can be created using such CLR languages. This CLR language support extends not only to user-defined functions, but also to stored procedures and triggers. You can develop such CLR add-ons to SQL Server using Visual Studio 2008.
14. What are synonyms?
Synonyms give you the ability to provide alternate names for database objects. You can alias object names; for example, using the Employee table as Emp. You can also shorten names. This is especially useful when dealing with three and four part names; for example, shortening server.database.owner.object to object.
15. What is LINQ?
Language Integrated Query (LINQ) adds the ability to query objects using .NET languages. The LINQ to SQL object/relational mapping (O/RM) framework provides the following basic features:
  1. Tools to create classes (usually called entities) mapped to database tables
  2. Compatibility with LINQ's standard query operations
  3. The DataContext class, with features such as entity record monitoring, automatic SQL statement generation, record concurrency detection, and much more
16. What is Isolation Levels?
Transactions specify an isolation level that defines the degree to which one transaction must be isolated from resource or data modifications made by other transactions. Isolation levels are described in terms of which concurrency side-effects, such as dirty reads or phantom reads, are allowed.
Transaction isolation levels control:
  1. Whether locks are taken when data is read, and what type of locks are requested.
  2. How long the read locks are held.
  3. Whether a read operation referencing rows modified by another transaction:
    1. Blocks until the exclusive lock on the row is freed.
    2. Retrieves the committed version of the row that existed at the time the statement or transaction started.
    3. Reads the uncommitted data modification.
17. What is use of EXCEPT Clause?
EXCEPT clause is similar to MINUS operation in Oracle. The EXCEPT query and MINUS query returns all rows in the first query that are not returned in the second query. Each SQL statement within the EXCEPT query and MINUS query must have the same number of fields in the result sets with similar data types.
18. How would you handle error in SQL SERVER 2008?
SQL Server now supports the use of TRY...CATCH con handling. TRY...CATCH lets us build error handling at the level we need, in the way w to, by setting a region where if any error occurs, it will break out of the region and head to an error handler.

The basic structure is as follows:
BEGIN TRY
stmts..
END TRY
BEGIN CATCH
stmts..
END CATCH

19. What is RAISEERROR?
RaiseError generates an error message and initiates error processing for the session. RAISERROR can either reference a user-defined message stored in the sys.messages catalog view or build a message dynamically. The message is returned as a server error message to the calling application or to an associated CATCH block of a TRY | CATCH construct.
20. How to rebuild Master Database?
Master database is system database and it contains information about running server's configuration. When SQL Server 2005 is installed it usually creates master, model, msdb, tempdb resource and distribution system database by default. Only Master database is th one which is absolutely must have database. Without Master database SQL Server cannot be started. This is the reason it is extremely important to backup Master database.
To rebuild the Master database, Run Setup.exe, verify, and repair a SQL Server instance, and rebuild the system databases. This procedure is most often used to rebuild the master database for a corrupted installation of SQL Server.
21. What is XML Datatype?
The xml data type lets you store XML documents and fragments in a SQL Server database. An XML fragment is an XML instance that is missing a single top-level element. You can create columns and variables of the xml type and store XML instances in them. The xml data type and associated methods help integrate XML into the relational framework of S Server.
22. What is Data Compression?
In SQL SERVE 2008 Data Compression comes in two flavors:
  1. Row Compression: Row compression changes the format of physical storage of data. It minimize the metadata (column information, length, offsets etc) associated with each record. Numeric data types and fixed length strings are stored in variable-length storage format, just like Varchar.
  2. Page Compression: Page compression allows common data to be shared between rows for a given page. Its uses the following techniques to compress data:
    1. Row compression.
    2. Prefix Compression. For every column in a page duplicate prefixes are identified. These prefixes are saved in compression information headers (CI) which resides after page header. A reference number is assigned to these prefixes and that reference number is replaced where ever those prefixes are being used.
  3. Dictionary Compression: Dictionary compression searches for duplicate values throughout the page and stores them in CI. The main difference between prefix and dictionary compression is that prefix is only restricted to one column while dictionary is applicable to the complete page.
23. What is Catalog Views?
Catalog views return information that is used by the SQL Server Database Engine. Catalog Views are the most general interface to the catalog metadata and provide the most efficient way to obtain, transform, and present customized forms of this information. All user- available catalog metadata is exposed through catalog views.
24. What is PIVOT and UNPIVOT?
Pivot Table can automatically sort, count, and total the data stored in one table or spreadsheet and create a second table displaying the summarized data. The PIVOT operator turns the values of a specified column into column names, effectively rotating a table.
UNPIVOT table is reverse of PIVOT Table.

25. What is Dirty Read ?
A dirty read occurs when two operations say, read and write occurs together giving the incorrect or unedited data. Suppose, A has changed a row, but has not committed the changes. B reads the uncommitted data but his view of the data may be wrong so that is Dirty Read.
26. What is Aggregate Functions?
Aggregate functions perform a calculation on a set of values and return a single value. Aggregate functions ignore NULL values except COUNT function. HAVING clause is used, along with GROUP BY, for filtering query using aggregate values.

Following functions are aggregate functions.

AVG, MIN CHECKSUM_AGG, SUM, COUNT, STDEV, COUNT_BIG, STDEVP, GROUPING, VAR, MAX. VARP
27. What do you mean by Table Sample?
TABLESAMPLE allows you to extract a sampling of rows from a table in the FROM clause. The rows retrieved are random and they are not in any order. This sampling can be based on a percentage of number of rows. You can use TABLESAMPLE when only a sampling of rows is necessary for the application instead of a full result set.
28. What is the difference between UNION and UNION ALL?
  1. UNION The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
  2. UNION ALL The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
29. What is B-Tree?
The database server uses a B-tree structure to organize index information. B-Tree generally has following types of index pages or nodes:
  1. root node: A root node contains node pointers to branch nodes which can be only one.
  2. branch node: A branch node contains pointers to leaf nodes or other branch nodes which can be two or more.
  3. leaf nodes: A leaf node contains index items and orizantal pointers to other leaf nodes which can be many.

Optimize SQL Server queries with these advanced tuning techniques


The best way to tune performance is to try to write your queries in a number of different ways and compare their reads and execution plans. Here are various techniques that you can use to try to optimize your database queries.

Now that you know how to speed up your SQL Server database queries, you can start delving into some of the more advanced tuning options.
The best way to tune performance is to try to write your queries in a number of different ways and compare their reads and execution plans. Here are various techniques that you can use to try to optimize your database queries.

Use JOINs rather than subqueries

If possible (and if it makes sense), I suggest using JOIN statements rather than subqueries to improve performance. When a subquery is used as criteria in a SELECT statement, the values returned from the subquery are distinct. Returning a distinct list of values requires additional processing, which can slow down your queries.

Use explicit transactions

When data manipulation occurs in the database, the actions are written to the transaction log. If your statements are executing many DML statements, it might be a good idea to place them inside of a transaction for performance purposes. Placing the statements inside of a transaction will prevent all of the statements from being written to the transaction log serially. For example, the following statement takes approximately seven seconds to execute on my laptop:
CREATE InsertTable
 (
       IDCol INT IDENTITY(1,1),
       ColVal INT
 )
 GO
 DECLARE @Counter INT
 SET @Counter = 1   

 WHILE @Counter < 15000
 BEGIN
       INSERT INTO InsertTable(ColVal)
       SELECT DATEPART(ms, GETDATE())   

       SET @Counter = @Counter + 1
 END
If I wrap the INSERT statements inside of a transaction, it takes a little less than two seconds to execute. This is because the statements are inside of a transaction rather than committed to a transaction log until the transaction commits. This reduces the number of writes to the log.
DECLARE @Counter INT
 SET @Counter = 1
 BEGIN TRAN
 WHILE @Counter < 15000
 BEGIN
       INSERT INTO InsertTable(ColVal)
       SELECT DATEPART(ms, GETDATE())   

       SET @Counter = @Counter + 1
 END
 COMMIT TRAN
Note: I advise you to use this approach with care. If there are too many statements inside a transaction, it will increase the duration of the transaction, which increases the amount of time locks are held on the objects inside of the transaction.

Use UNION ALL instead of UNION

When you use the UNION clause to concatenate the results from two or more SELECT statements, duplicate records are removed. This duplicate removal requires additional computing to accomplish. If you are not concerned that your results may include duplicate records, use the UNION ALL clause, which concatenates the full results from the SELECT statements.

Use EXISTS when possible

When you need to check for the presence of certain conditions, it is usually faster to use theEXISTS function over COUNT(*). This is because COUNT(*) has to scan all records returned by the statement, while EXISTS will return a true value as soon as it finds a record that meets the criteria.

STATISTICS IO

There are different ways to determine the best way to write your queries. Two of my favorite methods are looking at the number of logical reads produced by the query and looking at graphical execution plans provided by SQL Server Management Studio. For determining the number of logical reads, you can turn the STATISTICS IO option ON. Consider this query:
SET STATISTICS IO ON
 SELECT * FROM SalesHistory
The following is returned in the Messages window in SQL Server Management Studio:
Table 'SalesHistory'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There are several bits of data returned by STATISTICS IO, but I am really only concerned with the logical reads portion because it will tell me the number of pages read from the data cache. This is the most helpful to me because it will stay constant when I run the same query, which is important because there are sometimes external factors that might vary the execution time of my queries, such as locking by other queries.
When I’m tuning my queries, my goal is to get the number of logical reads as low as possible. Fewer logical reads typically leads to faster execution times.
Contact Us:

Email:

Vinodkumar434@gmail.com,
vinodtechnosoft@gmail.com

Skype Name:

vinodtechnosoft