1. Computer problem? Tech Support Guy is completely free -- paid for by advertisers and donations. Click here to join today! If you're new to Tech Support Guy, we highly recommend that you visit our Guide for New Members.

Convert HTML to Text in Excel Cells

Discussion in 'Business Applications' started by Oak34, Feb 5, 2009.

Thread Status:
Not open for further replies.
Advertisement
  1. Oak34

    Oak34 Thread Starter

    Joined:
    Feb 5, 2009
    Messages:
    4
    I have a column of about 10,000 different cells that are all coded in HTML. I want to strip the HTML from these cells so I can work with just the text the HTML would create. Please help! I started doing it manually until I realized it would take me about a month! :(
     
  2. slurpee55

    slurpee55

    Joined:
    Oct 20, 2004
    Messages:
    7,837
    I hate (almost) to fall back on my beloved friend once again, but I would suggest you get the great FREE add-in for Excel, ASAP utilities from http://www.asap-utilities.com/
    Once you have it installed, make a copy of the worksheet you are trying to clean up. Than open up Excel and ASAP should open up automatically. If you go to 13 - Web and then to 1 - Clean Web imported data and run that on your copy of your data.
    It works pretty darn well, even on oddly formatted pages; a single column should clean up really well.
     
  3. Oak34

    Oak34 Thread Starter

    Joined:
    Feb 5, 2009
    Messages:
    4
    Thanks for the advice. I ran the program and it didn't quite do the trick. An example of the text I am trying to extract is below. The code is first and below it is how it would look on a website. I want to extract just the text that appears so that my final result would be "TitleSmallText1SmallText2MainText" in an Excel cell. Thanks ~ Oak

    CODE:

    <div align="center"><b>Title<br></b>Small Text 1<br>Small Text 2<br><br> <p class="MsoNormal">Main Text</p> <br></div>

    HOW IT LOOKS ON A WEBSITE:

    Title
    Small Text 1
    Small Text 2

    Main Text
     
  4. slurpee55

    slurpee55

    Joined:
    Oct 20, 2004
    Messages:
    7,837
    Well, another program that I have - and it is good for this - is NoteTab Light.
    I entered the above HTML into it and this is the result:
    Title
    Small Text 1
    Small Text 2

    Main Text
    It stripped out the bold as well.
    You just download the program from http://www.notetab.com/download.php and copy your column of data and paste it into a new page.
    Go to Modify, Strip HTML tags, Remove all tags. There you go!
    Copy it and paste it back into Excel.
     
  5. Oak34

    Oak34 Thread Starter

    Joined:
    Feb 5, 2009
    Messages:
    4
    Thanks. Using that, I can copy and paste each cell and have the HTML stripped. I was hoping that a program save the copying and pasting of 10,000 cells.

    If you can think of anything else that might do that, I am all ears. Thank you again for your help.

    ~Oak
     
  6. EAFiedler

    EAFiedler Moderator

    Joined:
    Apr 25, 2000
    Messages:
    14,002
    Hi Oak34

    I filled 30 rows with your example:
    <div align="center"><b>Title<br></b>Small Text 1<br>Small Text 2<br><br> <p class="MsoNormal">Main Text</p> <br></div>



    Using Find All and Replace All, I replaced: <*> with no space.
    My result was:
    TitleSmall Text 1Small Text 2 Main Text


    I ran the Find All and Replace All one more time to remove the spaces.
    To get the final result: TitleSmallText1SmallText2MainText
     
  7. turbodante

    turbodante

    Joined:
    Dec 19, 2008
    Messages:
    744

    Can you open the html in a browser and copy and paste from there?
     
  8. slurpee55

    slurpee55

    Joined:
    Oct 20, 2004
    Messages:
    7,837
    Yes, <*> and replacing it with a space gives you:
    Title Small Text 1 Small Text 2 Main Text
    rather than the results I showed in #4
    so the real question is, what format do you want this data in?
    If it is all text with no headers, etc. and you just want to strip out the <some statement> commands such as <STRONG>, then Replace All should work.
     
  9. Oak34

    Oak34 Thread Starter

    Joined:
    Feb 5, 2009
    Messages:
    4
    Wow. The <*> just saved me about a month of work. Truly appreciated. ~ Oak
     
  10. Sponsor

As Seen On
As Seen On...

Welcome to Tech Support Guy!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.

Join over 733,556 other people just like you!

Loading...
Thread Status:
Not open for further replies.

Short URL to this thread: https://techguy.org/797849