How To forum


Clean duplicates data from Excel files with 80000 Records.

by dherruyadav / February 25, 2014 1:11 PM PST

Hello There,
We are an eCommerce company dealing with Gifts. We have a database of 80000 customers who have made purchases from our website.
The big problems with us are-
1) 5-10% of total customer are repeat and their details have been entered multiple times.

2) There is formatting issue with values. For example<span id="d65fd4b8-b30e-4fd7-b42f-23e2796634a8" ginger_software_uiphraseguid="6d0ea9e0-bba2-47d1-b4c3-0ced30116f36" class="GINGER_SOFTWARE_mark"> -
Phone numbers are written in various formats<span id="63ec935a-a049-429f-a09c-f89334a7589e" ginger_software_uiphraseguid="0fd1ed93-5359-43ee-825c-b2bded3f1da5" class="GINGER_SOFTWARE_mark"> -

Is there any way we can clean this big data using some software? Doing it manually will take huge manpower and time.


Discussion is locked
You are posting a reply to: Clean duplicates data from Excel files with 80000 Records.
The posting of advertisements, profanity, or personal attacks is prohibited. Please refer to our CNET Forums policies for details. All submitted content is subject to our Terms of Use.
Track this discussion and email me when there are updates

If you're asking for technical help, please be sure to include all your system info, including operating system, model number, and any other specifics related to the problem. Also please exercise your best judgment when posting in the forums--revealing personal information such as your e-mail address, telephone number, and address is not recommended.

You are reporting the following post: Clean duplicates data from Excel files with 80000 Records.
This post has been flagged and will be reviewed by our staff. Thank you for helping us maintain CNET's great community.
Sorry, there was a problem flagging this post. Please try again now or at a later time.
If you believe this post is offensive or violates the CNET Forums' Usage policies, you can report it below (this will not automatically remove the post). Once reported, our moderators will be notified and the post will be reviewed.

All Answers

Collapse -
Re: clean duplicates
by Kees_B Forum moderator / February 25, 2014 6:00 PM PST

You can write your own application to do this in Excel VBA. Or you can export it to a real database system like MS Access and write SQL and VAB there.

The phione number issue is clear. The formatting issue isn't, as you can see. But is formatting (how it looks on the screen) interesting?

Having your customer data in a spreadsheet seems rather useless. I'd first determine why you keep them, how you want to use them and what application you want to use for that. Once that is all set up, putting clean data into it comes into view.


Collapse -
I see folk have looked for this before. Example search
by R. Proffitt Forum moderator / February 26, 2014 2:02 AM PST
Popular Forums
Computer Newbies 10,686 discussions
Computer Help 54,365 discussions
Laptops 21,181 discussions
Networking & Wireless 16,313 discussions
Phones 17,137 discussions
Security 31,287 discussions
TVs & Home Theaters 22,101 discussions
Windows 7 8,164 discussions
Windows 10 2,657 discussions


Help, my PC with Windows 10 won't shut down properly

Since upgrading to Windows 10 my computer won't shut down properly. I use the menu button shutdown and the screen goes blank, but the system does not fully shut down. The only way to get it to shut down is to hold the physical power button down till it shuts down. Any suggestions?