How To forum


Clean duplicates data from Excel files with 80000 Records.

by dherruyadav / February 25, 2014 1:11 PM PST

Hello There,
We are an eCommerce company dealing with Gifts. We have a database of 80000 customers who have made purchases from our website.
The big problems with us are-
1) 5-10% of total customer are repeat and their details have been entered multiple times.

2) There is formatting issue with values. For example<span id="d65fd4b8-b30e-4fd7-b42f-23e2796634a8" ginger_software_uiphraseguid="6d0ea9e0-bba2-47d1-b4c3-0ced30116f36" class="GINGER_SOFTWARE_mark"> -
Phone numbers are written in various formats<span id="63ec935a-a049-429f-a09c-f89334a7589e" ginger_software_uiphraseguid="0fd1ed93-5359-43ee-825c-b2bded3f1da5" class="GINGER_SOFTWARE_mark"> -

Is there any way we can clean this big data using some software? Doing it manually will take huge manpower and time.


Discussion is locked
You are posting a reply to: Clean duplicates data from Excel files with 80000 Records.
The posting of advertisements, profanity, or personal attacks is prohibited. Please refer to our CNET Forums policies for details. All submitted content is subject to our Terms of Use.
Track this discussion and email me when there are updates

If you're asking for technical help, please be sure to include all your system info, including operating system, model number, and any other specifics related to the problem. Also please exercise your best judgment when posting in the forums--revealing personal information such as your e-mail address, telephone number, and address is not recommended.

You are reporting the following post: Clean duplicates data from Excel files with 80000 Records.
This post has been flagged and will be reviewed by our staff. Thank you for helping us maintain CNET's great community.
Sorry, there was a problem flagging this post. Please try again now or at a later time.
If you believe this post is offensive or violates the CNET Forums' Usage policies, you can report it below (this will not automatically remove the post). Once reported, our moderators will be notified and the post will be reviewed.

All Answers

Collapse -
Re: clean duplicates
by Kees_B Forum moderator / February 25, 2014 6:00 PM PST

You can write your own application to do this in Excel VBA. Or you can export it to a real database system like MS Access and write SQL and VAB there.

The phione number issue is clear. The formatting issue isn't, as you can see. But is formatting (how it looks on the screen) interesting?

Having your customer data in a spreadsheet seems rather useless. I'd first determine why you keep them, how you want to use them and what application you want to use for that. Once that is all set up, putting clean data into it comes into view.


Collapse -
I see folk have looked for this before. Example search
by R. Proffitt Forum moderator / February 26, 2014 2:02 AM PST
Popular Forums
Computer Help 51,912 discussions
Computer Newbies 10,498 discussions
Laptops 20,411 discussions
Security 30,882 discussions
TVs & Home Theaters 21,253 discussions
Windows 10 1,672 discussions
Phones 16,494 discussions
Windows 7 7,855 discussions
Networking & Wireless 15,504 discussions


Get live TV over the internet

Say goodbye to cable -- check out the top five live TV streaming services available now.