1. Computer problem? Tech Support Guy is completely free -- paid for by advertisers and donations. Click here to join today! If you're new to Tech Support Guy, we highly recommend that you visit our Guide for New Members.

Need OCR Software for email addresses

Discussion in 'All Other Software' started by wisedave, Jul 19, 2011.

Thread Status:
Not open for further replies.
Advertisement
  1. wisedave

    wisedave Thread Starter

    Joined:
    Mar 12, 2010
    Messages:
    108
    Hello,

    I have a business directory from the local chamber of commerce. I am looking for an OCR program where I can scan the pages to covert them to text. From there I want to harvest the email addresses.

    Any ideas on how to do this? I suspect it would involve two different applications, but am not sure.

    What are you ideas on how to solve this?

    Cheers,

    Dave
     
  2. DaveA

    DaveA Trusted Advisor Spam Fighter

    Joined:
    Nov 16, 1999
    Messages:
    15,596
    First Name:
    David
    First of all, is the "business directory" copy righted, if so it will be illegal to do what you want to do.

    But any good OCR program will make a spread sheet for you from the scan. Real up on what the OCTR that you have can do and go from there.
     
  3. wisedave

    wisedave Thread Starter

    Joined:
    Mar 12, 2010
    Messages:
    108
    Hello Dave,

    The only reason I am copying the directory is to harvest the email addresses. There are so many that processing it manually would be too labour and time intensive. I don't think copyright plays in this.

    Thanks for you input though.....

    Cheers,

    Dave
     
  4. DaveA

    DaveA Trusted Advisor Spam Fighter

    Joined:
    Nov 16, 1999
    Messages:
    15,596
    First Name:
    David
    I use a OCR program a lot.
    Can you post a sample of what these pages look like, it may give us more information on how we can help.

    Yes, if the publication is copy righted, then ANY means of harvesting can be legal, sorry.
     
  5. wisedave

    wisedave Thread Starter

    Joined:
    Mar 12, 2010
    Messages:
    108
    Hey Dave,

    Attached is a scan.....it's 300dpi, grey scale. If you need the settings changed or have another recommendation, let me know.

    Thanks for your efforts.

    Cheers,

    Dave
     

    Attached Files:

  6. DaveA

    DaveA Trusted Advisor Spam Fighter

    Joined:
    Nov 16, 1999
    Messages:
    15,596
    First Name:
    David
    That one is going to need a lot of manual work.
    The OCR should be able to make a "doc" file which could be edited into a tabbed list, which could be imported into a spreadsheet.

    It could also be OCR'd into a spreadsheet, but it see it going across the page for each row and that is NOT what you want.
     
  7. DaveA

    DaveA Trusted Advisor Spam Fighter

    Joined:
    Nov 16, 1999
    Messages:
    15,596
    First Name:
    David
    You might want to look into using something like this http://www.planon.com/products

    I know a lot of people that do Family Resaerch use these all the time.
     
  8. Squashman

    Squashman Trusted Advisor

    Joined:
    Apr 4, 2003
    Messages:
    19,783
    They have this publication for download on their website in PDF format. Even allows you to select what pages you want to download.
     
  9. Squashman

    Squashman Trusted Advisor

    Joined:
    Apr 4, 2003
    Messages:
    19,783
    I used a free PDF to text program to see what it would do and it pretty much makes the text unusable. It pulls the text out line by line. There is no way for it to pull the data out column by column.

    I think you should just ask the Chamber of Commerce if they have an excel spreadsheet they could send to you. They must have the data in some type of usable format otherwise they would have never gotten it into a printed publication.
     
  10. Squashman

    Squashman Trusted Advisor

    Joined:
    Apr 4, 2003
    Messages:
    19,783
    I tried Nitro's PDF to excel. That didn't work either.
     
  11. DaveA

    DaveA Trusted Advisor Spam Fighter

    Joined:
    Nov 16, 1999
    Messages:
    15,596
    First Name:
    David
    Squashman,
    That is one nice method of protecting the data from being harvested and I have found no way around, except to do a Print and then OCR the print.
     
  12. Noyb

    Noyb Trusted Advisor

    Joined:
    May 25, 2005
    Messages:
    20,773
    Microsoft Office reads it just fine .. You got M$ Office ???
    Office 2003 prefers a 600dpi scan in a tif format
     

    Attached Files:

    • qq.doc
      File size:
      20 KB
      Views:
      61
  13. Squashman

    Squashman Trusted Advisor

    Joined:
    Apr 4, 2003
    Messages:
    19,783
    How are you getting it into a Word Document from the PDF they have on their website?
    Where are the other columns?

    I think his intended Goal is it to harvest the email address for a List Mailing.
     
  14. Noyb

    Noyb Trusted Advisor

    Joined:
    May 25, 2005
    Messages:
    20,773
    I just read (OCR) the Image WiseDave attached in M$ Office and attached a small part of it.
    If it was a pdf from a web site .. I'd have Irfanview convert it to a tif Image for Office to read.
    (actually, I'd use photoshop)
    Columns and splitting out the email addresses from the doc is is another problem.
    I might try Find/Replace to change the carriage returns to a tab and paste it into Excel Columns.
     
  15. Squashman

    Squashman Trusted Advisor

    Joined:
    Apr 4, 2003
    Messages:
    19,783
    So if there is 3 columns per page why isn't it showing all 3 columns from the original document?
     
  16. Sponsor

As Seen On
As Seen On...

Welcome to Tech Support Guy!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.

Join over 733,556 other people just like you!

Loading...
Thread Status:
Not open for further replies.

Short URL to this thread: https://techguy.org/1008091

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice