-
Written By Rohit Singh
-
Published on December 2nd, 2021
-
Updated on September 4, 2023
Are you searching for a reliable solution to delete duplicate rows in SQL? Do you also face the issue of the duplicacy of rows in SQL? If yes, then you will find the apt solution to your query in this blog.
The issue of duplicate rows in SWL is not new. SQL users often face this problem that hinders their working. The duplicate rows issue in SQL can be avoided using a primary key, identity columns, clustered and non-clustered indexes, constraints, etc. Perhaps, there are some exceptions where these rules are not implemented, and you face this error.
As per the SQL database rules, there should not be any duplicacy. Therein, to avoid the issues and delete duplicates SQL, follow the best solutions provided further. But, before that, let us take a look at why it is essential to delete duplicate records in SQL server 2008 and other versions.
All the above-listed points show us the necessity to prevent the duplicacy of rows in SQL. So, follow the methods given below to delete duplicate records in SQL without a primary key.
The duplicate rows and records in SQL can be removed manually by applying different clauses. All the accurate methods to delete duplicate rows in SQL are mentioned below. However, you are required to create a sample table and data for the process execution.
Use the below commands in the SQL database to create the sample data. It creates a table in SQL that we would utilize for demonstration for resolving the duplicate rows in the SQL issue.
create table original_table (key_value int )
insert into original_table values (1)
insert into original_table values (1)
insert into original_table values (1)
insert into original_table values (2)
insert into original_table values (2)
insert into original_table values (2)
insert into original_table values (2)
Let us now start with the methods to delete duplicate records in SQL server 2000 and other versions.
It is a method in which we will first use the Group By clause to identify the duplicacy in the table. The data in the table will be grouped as per the defined columns. Afterward, we will proceed to delete the identified duplicate rows in SQL. Input the below commands in the SQL database to delete duplicate rows in SQL.
SELECT DISTINCT *
INTO duplicate_table
FROM original_table
GROUP BY key_value
HAVING COUNT(key_value) > 1
DELETE original_table
WHERE key_value
IN (SELECT key_value
FROM duplicate_table)
INSERT original_table
SELECT *
FROM duplicate_table
DROP TABLE duplicate_table
As you complete the above task, all the duplicate rows and records will be deleted. Furthermore, to execute this method without any errors, you need to have sufficient storage space available in the database. If you cannot delete duplicate records in SQL without a primary key using this method, follow the other solution below.
The Common Table Expressions, generally known as CTE, can be used to delete duplicate records in SQL server 2008 and other versions. In this method, you can apply the ROW_FUNCTION that is available from the SQL Server 2005. The clause makes the task to delete the duplicate rows in the SQL server much easier. Therefore, execute the below clause to complete this method in the SQL database.
DELETE T
FROM
(
SELECT *
,DupRank = ROW_NUMBER() OVER (
PARTITION BY key_value
ORDER BY (SELECT NULL)
)
FROM original_table
) AS T
WHERE DupRank > 1
As you apply the above script, it will first partition the data using the ROW_NUMBER function and then delete all the duplicate rows.
Now, if you compare the above two methods to delete duplicate rows in SQL, you will find the second method using the ROW_FUNCTION to be more efficient. The reasons for the same are given below.
The only drawback of this method is that it does not work with the outdated and older versions of SQL Servers.
To describe this method effectively, we will guide you by creating a new sample table and data.
CREATE TABLE Employee
(
[ID] INT identity(1,1),
[FirstName] Varchar(100),
[LastName] Varchar(100),
[Country] Varchar(100),
)
GO
Insert into Employee ([FirstName],[LastName],[Country] )values('ABC,'DEF’)
The RANK Function can be utilized to delete duplicate rows in SQL. It specifies a unique id for each row regardless of the duplicate rows. We would take the help of the Partition By clause also along with the RANK function for preparing the subset of data. So, follow the below set of commands in the SQL database to execute the process accurately.
SELECT E.ID,
E.firstname,
E.lastname,
E.country,
T.rank
FROM [SampleDB].[dbo].[Employee] E
INNER JOIN
(
SELECT *,
RANK() OVER(PARTITION BY firstname,
lastname,
country
ORDER BY id) rank
FROM [SampleDB].[dbo].[Employee]
) T ON E.ID = t.ID;
Now, as you have ranked and identified the duplicate data using the above task. You can now move ahead with the deletion process using the clause mentioned below.
DELETE E
FROM [SampleDB].[dbo].[Employee] E
INNER JOIN
(
SELECT *,
RANK() OVER(PARTITION BY firstname,
lastname,
country
ORDER BY id) rank
FROM [SampleDB].[dbo].[Employee]
) T ON E.ID = t.ID
WHERE rank > 1;
All the above methods to delete duplicate rows in SQL are competent to complete the task. However, it is recommended that you do not apply these methods and clauses directly to the data. It would be safe to test them on sample data before application.
Nonetheless, if you cannot execute the above methods delete duplicate records in SQL server 2000 and other versions. There may be a possibility that the SQL database file is corrupted. Moreover, the corruption in SQL databases also leads to data inaccuracy and duplicacy.
To fix this issue, there is an expert solution to recover and repair the SQL database files to remove any duplicacy. The Cigati SQL Recovery Tool is a robust utility that can repair the corrupt and damaged MDF and NDF files of the SQL database. The utility resolves any error of the SQL database that may be occurring due to the damage in the SQL files. Furthermore, the software is crafted with exemplary features that make the recovery process smooth.
Duplicate rows in SQL create data inaccuracy issues for the users. It has become essential to tackle the issue and fix it. To delete duplicate rows in SQL, use the manual solutions mentioned in the blog. All the methods are capable to undertake the duplicacy removal task in SQL. Nevertheless, if there is corruption in the SQL files and you are facing issues because of that, it is suggested to opt for the SQL Recovery Tool. It is a leading tool to recover and repair corrupt SQL database files, including all objects like tables, views, programmability, triggers, etc.
You May Also Read: Reset SA Password in SQL Server
About The Author:
Rohit Singh is an Email Backup, Recovery & Migration Consultant and is associated with Software Company from the last 3 years. He writes technical updates and their features related to MS Outlook, Exchange Server, Office 365, and many other Email Clients & Servers.
Related Post