Thank you for being a valued part of the CNET community. As of December 1, 2020, the forums are in read-only format. In early 2021, CNET Forums will no longer be available. We are grateful for the participation and advice you have provided to one another over the years.

Thanks,

CNET Support

Question

Clean duplicates data from Excel files with 80000 Records.

Feb 25, 2014 1:11PM PST

Hello There,
We are an eCommerce company dealing with Gifts. We have a database of 80000 customers who have made purchases from our website.
The big problems with us are-
1) 5-10% of total customer are repeat and their details have been entered multiple times.

2) There is formatting issue with values. For example<span id="d65fd4b8-b30e-4fd7-b42f-23e2796634a8" ginger_software_uiphraseguid="6d0ea9e0-bba2-47d1-b4c3-0ced30116f36" class="GINGER_SOFTWARE_mark"> -
Phone numbers are written in various formats<span id="63ec935a-a049-429f-a09c-f89334a7589e" ginger_software_uiphraseguid="0fd1ed93-5359-43ee-825c-b2bded3f1da5" class="GINGER_SOFTWARE_mark"> -
888-888-9999
(88Cool(88Cool(9999)
(888)-888-9999

Is there any way we can clean this big data using some software? Doing it manually will take huge manpower and time.

Thanks

Discussion is locked

- Collapse -
Answer
Re: clean duplicates
Feb 25, 2014 6:00PM PST

You can write your own application to do this in Excel VBA. Or you can export it to a real database system like MS Access and write SQL and VAB there.

The phione number issue is clear. The formatting issue isn't, as you can see. But is formatting (how it looks on the screen) interesting?

Having your customer data in a spreadsheet seems rather useless. I'd first determine why you keep them, how you want to use them and what application you want to use for that. Once that is all set up, putting clean data into it comes into view.

Kees

- Collapse -
Answer
I see folk have looked for this before. Example search
Feb 26, 2014 2:02AM PST