How To forum

Question

Clean duplicates data from Excel files with 80000 Records.

by dherruyadav / February 25, 2014 1:11 PM PST

Hello There,
We are an eCommerce company dealing with Gifts. We have a database of 80000 customers who have made purchases from our website.
The big problems with us are-
1) 5-10% of total customer are repeat and their details have been entered multiple times.

2) There is formatting issue with values. For example<span id="d65fd4b8-b30e-4fd7-b42f-23e2796634a8" ginger_software_uiphraseguid="6d0ea9e0-bba2-47d1-b4c3-0ced30116f36" class="GINGER_SOFTWARE_mark"> -
Phone numbers are written in various formats<span id="63ec935a-a049-429f-a09c-f89334a7589e" ginger_software_uiphraseguid="0fd1ed93-5359-43ee-825c-b2bded3f1da5" class="GINGER_SOFTWARE_mark"> -
888-888-9999
(888)(888)(9999)
(888)-888-9999

Is there any way we can clean this big data using some software? Doing it manually will take huge manpower and time.

Thanks

Discussion is locked
You are posting a reply to: Clean duplicates data from Excel files with 80000 Records.
The posting of advertisements, profanity, or personal attacks is prohibited. Please refer to our CNET Forums policies for details. All submitted content is subject to our Terms of Use.
Track this discussion and email me when there are updates

If you're asking for technical help, please be sure to include all your system info, including operating system, model number, and any other specifics related to the problem. Also please exercise your best judgment when posting in the forums--revealing personal information such as your e-mail address, telephone number, and address is not recommended.

You are reporting the following post: Clean duplicates data from Excel files with 80000 Records.
This post has been flagged and will be reviewed by our staff. Thank you for helping us maintain CNET's great community.
Sorry, there was a problem flagging this post. Please try again now or at a later time.
If you believe this post is offensive or violates the CNET Forums' Usage policies, you can report it below (this will not automatically remove the post). Once reported, our moderators will be notified and the post will be reviewed.

All Answers

Collapse -
Answer
Re: clean duplicates
by Kees_B Forum moderator / February 25, 2014 6:00 PM PST

You can write your own application to do this in Excel VBA. Or you can export it to a real database system like MS Access and write SQL and VAB there.

The phione number issue is clear. The formatting issue isn't, as you can see. But is formatting (how it looks on the screen) interesting?

Having your customer data in a spreadsheet seems rather useless. I'd first determine why you keep them, how you want to use them and what application you want to use for that. Once that is all set up, putting clean data into it comes into view.

Kees

Collapse -
Answer
I see folk have looked for this before. Example search
by R. Proffitt Forum moderator / February 26, 2014 2:02 AM PST
Popular Forums
icon
Computer Newbies 10,686 discussions
icon
Computer Help 54,365 discussions
icon
Laptops 21,181 discussions
icon
Networking & Wireless 16,313 discussions
icon
Phones 17,137 discussions
icon
Security 31,287 discussions
icon
TVs & Home Theaters 22,101 discussions
icon
Windows 7 8,164 discussions
icon
Windows 10 2,657 discussions

CNET FORUMS TOP DISCUSSION

Help, my PC with Windows 10 won't shut down properly

Since upgrading to Windows 10 my computer won't shut down properly. I use the menu button shutdown and the screen goes blank, but the system does not fully shut down. The only way to get it to shut down is to hold the physical power button down till it shuts down. Any suggestions?