1. Computer problem? Tech Support Guy is completely free -- paid for by advertisers and donations. Click here to join today! If you're new to Tech Support Guy, we highly recommend that you visit our Guide for New Members.

Filtering the duplicate entries

Discussion in 'Business Applications' started by Caliraj, Jul 26, 2006.

Thread Status:
Not open for further replies.
Advertisement
  1. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    I have a text file that has contents as below :

    Duplicate molecule(s) found for regno 1:
    3
    Duplicate molecule(s) found for regno 2:
    5 8
    Duplicate molecule(s) found for regno 3:
    1

    I have attached a sample file also(sample_data.txt).Is there a macro or a function by which I can remove the duplicates?
    For eg.
    For regno 1,there is a duplicate at Regno.3.Now again when it comes to Regno 3,it says there is a duplicate at Regno 1.I can delete this manually but actually there are around 100000 regnos for which doing the manual way would take a minimum of a week.Hence could anyone help me out with this issue?
     

    Attached Files:

  2. The Villan

    The Villan

    Joined:
    Feb 20, 2006
    Messages:
    2,255
    If you can get the records into Excel, there is a quick solution.

    In Excel there is a menu choice called Data, Filter, Advanced Filter

    I have uploaded a simple example, column A and B is the data.
    Cells F1/F2 is the criteria (query) F2 left blank means all records
    Colums J & K is the result after applying the Advanced Filter for unique records only.

    If this is of any use, I can provide details of how to do it.
     

    Attached Files:

  3. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    thanks for the reply.Actually i had also opened the text file with Excel, butt then i could not proceed further.After opening the file in Excel,I replaced all the words with a space so that u get only nos.
    One more thing is that some regnos have 2 or 3 duplicates(eg, for reggno 3, there is a duplicate at 22879 & 101511).I have attached a screen shot of that & the original file .Could you help?
     

    Attached Files:

  4. The Villan

    The Villan

    Joined:
    Feb 20, 2006
    Messages:
    2,255
    Blimey I can see your problem. My answer doesn't fit what you need.

    This is another way to get rid of the text and finish up with just the numbers.

    Copy the data from notepad into a blank spreadsheet.

    Click on Data, Text to columns, Select delimited and click on Next

    Make sure only the space option is selected, then click on Next and then Finish

    Next step to get rid of text
    Click on Cell A1

    Click on the Edit menu and click on Goto.

    In the dialogue box, click on the option Special...

    In the next dialogue box

    Click on Constants and under the formulas options, turn off the Numbers, Logicals and Errors options, leaving just text switched on. Click on OK

    Now click on Edit, Delete, Shift cells up, and then click on OK.

    You should now be left with only cells with numbers in.

    Step 3

    Select columns A, B, C

    Click on the Edit menu and click on Goto.

    In the dialogue box, click on the option Special...

    In the next dialogue box

    Click on Blanks then Click on OK

    Now click on Edit, Delete, Shift cells up, and then click on OK.

    You should now be left with colums A, B and C and all the numbers are at the top of the columns.

    Select the numbers in column B and Move the numbers into column A
    Do the same for column C

    You should now have all the numbers in one column.

    Its long winded, but will do the job in a few steps.

    I have to go out now, so have a go at it and post back if it works.
     
  5. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    Just a clarification.the steps that you had given is to remove the text if im right?If thats the case, actually i had opened the text file using Excel & then replaced the words with spaces so that finally only the numbers will be present(i have attached a sample excel sheet).I would like to know if the duplicates can now be removed.kindly help me to proceed further.
    Thanks & Regards.
     

    Attached Files:

  6. The Villan

    The Villan

    Joined:
    Feb 20, 2006
    Messages:
    2,255
    Caliraj
    Just got back.

    You need to forget that one. Thats the first suggestion. The numbers haven't been split in some cells

    You need to start again and follow my new instructions as at 26 July 2006 11:14 am, in order to get all the numbers into one column.

    When you have achieved that, then you can do the data, advanced filter concept.

    Best to keep to a small amount of data to get the concept right.

    Sorry about that.
     
  7. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    I think you are talking about your 1st mail in which u had sent an attachment with an example of advance filter.i was unable to get theat concept of how to choose the criteria & list range.moreover,some cells will also have more than 1 numbe(like 22879 & 101511).so im not quite sure how to go about it.could you get mu problem?
    Thanks & regards.
     
  8. The Villan

    The Villan

    Joined:
    Feb 20, 2006
    Messages:
    2,255
    Your dataa.xls example is the one I am working on. You can't do an advanced data filter, until you have all the numbers into one column.

    Have you carried out my new instructions as at 26 July 2006 11:14 am, in order to get all the numbers into one column?

    If so, then he advanced filter is next. If not, you must do that first.

    For advanced filter, you have your raw data with field headings.

    You copy the filed headings into another location on the same spreadsheet. These field headings will form the basis for you criteria(Query). If the cells directly below the field heading are left blank, then Excel will assume that you want to do an advanced filter on all the numbers. if you put in the blank cell >20000, it would ignore all numbers below and equal to 20000.

    You also copy the field headings to a blank area of the spreadsheet. These headings will form the basis for the results of the query. Excel will look at your criteria and find all the records that match your criteria(Query) and drop a copy of all the records that match your query under the field headings.

    How do you do that once the headings are set up, using my example attached. AdvFilter.xls

    Make cell A1 the active cell

    Click on the data menu, select advanced data filter...

    In the advanced data filter dialogue box do the following

    1. Make sure that under Action, you have selected "Copy to another location"
    2. Excel will already have detected the List Range (If not, then change it accordingly)
    3. Set the Criteria range by typing $F$1:$F$2
    4. Set the Copy to range as $J$1:$K$1
    5. Make sure that the option "unique records only" is turned on

    6. Check you have everything correct and then click on OK
    7. You should now have under cells J1 and K1 all the records that match your criteria(query), which was - look at all the records in A1 to B9 and copy only one version of each number(unique records only) into under J1 to K1.

    If that didn't work, you did something wrong in the data filter dialogue box.

    Please note that it is best to copy field headings from your raw data to the criteria area and the Copy to Range area. Excel requires the field headings to be exactly the same, so it is not good practise to duplicate the filed headings by typing them again. Likewise Excel does not like you using several filed headings that are the same, because it gets confused. So for example, if you have 3 fields called "Number", you need to give them different names, such as Number1, Number2, Number3
     

    Attached Files:

  9. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    Actually i reside in india so the time zones differ.So im not quite sure which mail you are talking about.I presume it is the 1sr reply i got from you where you had sent me an attachment(the 2nd reply had instructions to remove all the text I guess).The dataa.xls sheet has numbers in 2 columns(A & E).Now should i move the numbers from column E to A?(here im not sure how do I move them because if i copy,then the numbers in A would be overwritten).
    Let me know if im right about my interpretation.
    Thanks & Regards.
     
  10. The Villan

    The Villan

    Joined:
    Feb 20, 2006
    Messages:
    2,255
    What I want you to do is the following.
    In your first post of this thread, you uploaded a file called Sample Data.txt.
    I want you to do the following with that original data. Forget anything else you have done.

    Copy the data from notepad into a blank spreadsheet.

    Click on Data, Text to columns, Select delimited and click on Next

    Make sure only the space option is selected, then click on Next and then Finish

    Next step to get rid of text
    Click on Cell A1

    Click on the Edit menu and click on Goto.

    In the dialogue box, click on the option Special...

    In the next dialogue box

    Click on Constants and under the formulas options, turn off the Numbers, Logicals and Errors options, leaving just text switched on. Click on OK

    Now click on Edit, Delete, Shift cells up, and then click on OK.

    You should now be left with only cells with numbers in.

    Step 3

    Select columns A, B, C

    Click on the Edit menu and click on Goto.

    In the dialogue box, click on the option Special...

    In the next dialogue box

    Click on Blanks then Click on OK

    Now click on Edit, Delete, Shift cells up, and then click on OK.

    You should now be left with colums A, B and C and all the numbers are at the top of the columns.

    Select the numbers in column B and click on the Edit menu and select move
    Click on the first blank cell in column A (directly under the numbers already there)
    Clikc on the Edit menu and select paste.
    This will place the numbers in column B under the numbers in column A

    Do the same for column C i.e.
    Select the numbers in column C and click on the Edit menu and select move
    Click on the first blank cell in column A (directly under the numbers already there)
    Click on the Edit menu and select paste.
    This will place the numbers in column C under the numbers in column A

    You should now have all the numbers in one column.

    Now we can move onto the Data, Advanced Filter part

    For advanced filter, you have your raw data with field headings.

    You copy the filed headings into another location on the same spreadsheet. These field headings will form the basis for you criteria(Query). If the cells directly below the field heading are left blank, then Excel will assume that you want to do an advanced filter on all the numbers. if you put in the blank cell >20000, it would ignore all numbers below and equal to 20000.

    You also copy the field headings to a blank area of the spreadsheet. These headings will form the basis for the results of the query. Excel will look at your criteria and find all the records that match your criteria(Query) and drop a copy of all the records that match your query under the field headings.

    How do you do that once the headings are set up, using my example attached. AdvFilter.xls

    Click on cell A1 to make it active

    Click on the data menu, select advanced data filter...

    For advanced filter, you have your raw data with field headings.

    You copy the filed headings into another location on the same spreadsheet. These field headings will form the basis for you criteria(Query). If the cells directly below the field heading are left blank, then Excel will assume that you want to do an advanced filter on all the numbers. if you put in the blank cell >20000, it would ignore all numbers below and equal to 20000.

    You also copy the field headings to a blank area of the spreadsheet. These headings will form the basis for the results of the query. Excel will look at your criteria and find all the records that match your criteria(Query) and drop a copy of all the records that match your query under the field headings.

    How do you do that once the headings are set up, using my example attached. AdvFilter.xls

    Make cell A1 the active cell

    Click on the data menu, select advanced data filter...

    In the advanced data filter dialogue box do the following

    1. Make sure that under Action, you have selected "Copy to another location"
    2. Excel will already have detected the List Range (If not, then change it accordingly)
    3. Set the Criteria range by typing $F$1:$F$2
    4. Set the Copy to range as $J$1:$K$1
    5. Make sure that the option "unique records only" is turned on

    6. Check you have everything correct and then click on OK
    7. You should now have under cells J1 and K1 all the records that match your criteria(query), which was - look at all the records in A1 to B9 and copy only one version of each number(unique records only) into under J1 to K1.

    If that didn't work, you did something wrong in the data filter dialogue box.

    Please note that it is best to copy field headings from your raw data to the criteria area and the Copy to Range area. Excel requires the field headings to be exactly the same, so it is not good practise to duplicate the filed headings by typing them again. Likewise Excel does not like you using several filed headings that are the same, because it gets confused. So for example, if you have 3 fields called "Number", you need to give them different names, such as Number1, Number2, Number3
     

    Attached Files:

  11. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    Thanks for the replies.Regarding moving the column B, im facing a problem.After step 3,you have mentioned that I should be left with columns A,B & C and all the numbers are on top of the columns.Could you explain what you mean by the statement,"on top of the columns'?I have attached the xl sheet where i have carried out till step 3.
    Im not able to get the option 'Move'.Inbstead i get 'Move or copy'.I have attached the snapshot of that also.I feel im going wrong somewhere.Kindly let me know about that.
    Thanks & Regards.
     

    Attached Files:

  12. The Villan

    The Villan

    Joined:
    Feb 20, 2006
    Messages:
    2,255
    I am sorry I should have said Edit, Cut not Edit, Move

    By top of the column, I mean that you should have numbers from cell B1 and C1

    Great we are getting somewhere :) Using the example you just posted, continue using the instruction below (forget any prior instructions)

    Select the cells B1 to B21 (do not select the complete column) and click on the Edit menu and select Cut
    Click on the first blank cell in column A (directly under the numbers already there). In your example it is A172
    Click on the Edit menu and select paste.
    This will place the numbers in column B under the numbers in column A

    Do the same for column C i.e.
    Select the cells C1 to C21 (do not select the complete column) and click on the Edit menu and select Cut
    Click on the first blank cell in column A (directly under the numbers already there). In your example it is A193 (hopefully :) )
    Click on the Edit menu and select paste.
    This will place the numbers in column C under the numbers in column A
    You should now have all the numbers in one column.

    Now we can move onto the Data, Advanced Filter part

    For advanced filter, you have your raw data with field headings.

    You copy the field headings into another location on the same spreadsheet. These field headings will form the basis for you criteria(Query). If the cells directly below the field heading are left blank, then Excel will assume that you want to do an advanced filter on all the numbers. if you put in the blank cell >20000, it would ignore all numbers below and equal to 20000.

    You also copy the field headings to a blank area of the spreadsheet. These headings will form the basis for the results of the query. Excel will look at your criteria and find all the records that match your criteria(Query) and drop a copy of all the records that match your query under the field headings.

    Do the following

    Click on cell A1 to make it active
    Then click on the menu Insert and select rows

    You should have a blank row1

    In cell A1 Type - Numbers (to give column A a field heading

    Now copy it to cell H1 and J1

    Make sure you select A1

    Click on the data menu, select advanced data filter...

    Click on the data menu, select advanced data filter...

    In the advanced data filter dialogue box do the following

    1. Make sure that under Action, you have selected "Copy to another location"
    2. Excel will already have detected the List Range (If not, then change it accordingly)
    3. Set the Criteria range by typing $H$1:$H$2
    4. Set the Copy to range as $J$1
    5. Make sure that the option "unique records only" is turned on

    6. Check you have everything correct and then click on OK
    7. You should now have under cells J1 all the records that match your criteria(query), which was - look at all the records in column A and copy only one version of each number(unique records only) into under J1.

    If that didn't work, you did something wrong in the data filter dialogue box.

    Please note that it is best to copy field headings from your raw data to the criteria area and the Copy to Range area. Excel requires the field headings to be exactly the same, so it is not good practise to duplicate the filed headings by typing them again. Likewise Excel does not like you using several filed headings that are the same, because it gets confused. So for example, if you have 3 fields called "Number", you need to give them different names, such as Number1, Number2, Number3

    Keep on coming back if things don't go right, or the instructions aren't clear.
     
  13. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    i have got all the nos from B & C under A.but I don get whet you mean by raw data with field headings.The xl sheet has nos in column A & F without any headings after the 3rd step I guess.
    thanks & regards.
     
  14. The Villan

    The Villan

    Joined:
    Feb 20, 2006
    Messages:
    2,255
    Don't worry about that for the moment, just do the next steps - you are almost there. I have to go out shortly, so I would like to get the job done if possib le before I go.

    Do the following

    Click on cell A1 to make it active
    Then click on the menu Insert and select rows

    You should have a blank row1

    In cell A1 Type - Numbers (to give column A a field heading

    Now copy it to cell H1 and J1

    Make sure you select A1

    Click on the data menu, select advanced data filter...

    Click on the data menu, select advanced data filter...

    In the advanced data filter dialogue box do the following

    1. Make sure that under Action, you have selected "Copy to another location"
    2. Excel will already have detected the List Range (If not, then change it accordingly)
    3. Set the Criteria range by typing $H$1:$H$2
    4. Set the Copy to range as $J$1
    5. Make sure that the option "unique records only" is turned on

    6. Check you have everything correct and then click on OK
    7. You should now have under cells J1 all the records that match your criteria(query), which was - look at all the records in column A and copy only one version of each number(unique records only) into under J1.
     
  15. Caliraj

    Caliraj Thread Starter

    Joined:
    Mar 22, 2006
    Messages:
    145
    Hi,
    I ahve done exactly as per your instructions.But im not sure if the duplicates are removed.I have attached the resultant xl sheet for verification.
    Thanks & Regards.
     

    Attached Files:

  16. Sponsor

As Seen On
As Seen On...

Welcome to Tech Support Guy!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.

Join over 733,556 other people just like you!

Thread Status:
Not open for further replies.

Short URL to this thread: https://techguy.org/486460

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice