Advertisement

There's no such thing as a stupid question, but they're the easiest to answer.
Login
Search

Advertisement

Business Applications Business Applications
Search Search
Search for:
Tech Support Guy > > >

Convert HTML to Text in Excel Cells


(!)

Oak34's Avatar
Oak34 Oak34 is offline
Junior Member with 4 posts.
THREAD STARTER
 
Join Date: Feb 2009
05-Feb-2009, 04:11 PM #1
Convert HTML to Text in Excel Cells
I have a column of about 10,000 different cells that are all coded in HTML. I want to strip the HTML from these cells so I can work with just the text the HTML would create. Please help! I started doing it manually until I realized it would take me about a month!
slurpee55's Avatar
Computer Specs
Member with 7,837 posts.
 
Join Date: Oct 2004
Location: Southwest Iowa....
Experience: Currently stupid...
05-Feb-2009, 07:33 PM #2
I hate (almost) to fall back on my beloved friend once again, but I would suggest you get the great FREE add-in for Excel, ASAP utilities from http://www.asap-utilities.com/
Once you have it installed, make a copy of the worksheet you are trying to clean up. Than open up Excel and ASAP should open up automatically. If you go to 13 - Web and then to 1 - Clean Web imported data and run that on your copy of your data.
It works pretty darn well, even on oddly formatted pages; a single column should clean up really well.
Oak34's Avatar
Oak34 Oak34 is offline
Junior Member with 4 posts.
THREAD STARTER
 
Join Date: Feb 2009
05-Feb-2009, 07:58 PM #3
Thanks for the advice. I ran the program and it didn't quite do the trick. An example of the text I am trying to extract is below. The code is first and below it is how it would look on a website. I want to extract just the text that appears so that my final result would be "TitleSmallText1SmallText2MainText" in an Excel cell. Thanks ~ Oak

CODE:

<div align="center"><b>Title<br></b>Small Text 1<br>Small Text 2<br><br> <p class="MsoNormal">Main Text</p> <br></div>

HOW IT LOOKS ON A WEBSITE:

Title
Small Text 1
Small Text 2

Main Text
slurpee55's Avatar
Computer Specs
Member with 7,837 posts.
 
Join Date: Oct 2004
Location: Southwest Iowa....
Experience: Currently stupid...
05-Feb-2009, 08:30 PM #4
Well, another program that I have - and it is good for this - is NoteTab Light.
I entered the above HTML into it and this is the result:
Title
Small Text 1
Small Text 2

Main Text
It stripped out the bold as well.
You just download the program from http://www.notetab.com/download.php and copy your column of data and paste it into a new page.
Go to Modify, Strip HTML tags, Remove all tags. There you go!
Copy it and paste it back into Excel.
Oak34's Avatar
Oak34 Oak34 is offline
Junior Member with 4 posts.
THREAD STARTER
 
Join Date: Feb 2009
05-Feb-2009, 09:00 PM #5
Thanks. Using that, I can copy and paste each cell and have the HTML stripped. I was hoping that a program save the copying and pasting of 10,000 cells.

If you can think of anything else that might do that, I am all ears. Thank you again for your help.

~Oak
EAFiedler's Avatar
Moderator with 13,758 posts.
 
Join Date: Apr 2000
Location: Indiana
06-Feb-2009, 01:10 AM #6
Hi Oak34

I filled 30 rows with your example:
<div align="center"><b>Title<br></b>Small Text 1<br>Small Text 2<br><br> <p class="MsoNormal">Main Text</p> <br></div>



Using Find All and Replace All, I replaced: <*> with no space.
My result was:
TitleSmall Text 1Small Text 2 Main Text


I ran the Find All and Replace All one more time to remove the spaces.
To get the final result: TitleSmallText1SmallText2MainText
turbodante's Avatar
turbodante turbodante is offline
Senior Member with 744 posts.
 
Join Date: Dec 2008
Location: GMT UK
06-Feb-2009, 06:31 AM #7
Quote:
Originally Posted by Oak34 View Post
I have a column of about 10,000 different cells that are all coded in HTML. I want to strip the HTML from these cells so I can work with just the text the HTML would create. Please help! I started doing it manually until I realized it would take me about a month!

Can you open the html in a browser and copy and paste from there?
slurpee55's Avatar
Computer Specs
Member with 7,837 posts.
 
Join Date: Oct 2004
Location: Southwest Iowa....
Experience: Currently stupid...
06-Feb-2009, 08:31 AM #8
Yes, <*> and replacing it with a space gives you:
Title Small Text 1 Small Text 2 Main Text
rather than the results I showed in #4
so the real question is, what format do you want this data in?
If it is all text with no headers, etc. and you just want to strip out the <some statement> commands such as <STRONG>, then Replace All should work.
Oak34's Avatar
Oak34 Oak34 is offline
Junior Member with 4 posts.
THREAD STARTER
 
Join Date: Feb 2009
06-Feb-2009, 12:41 PM #9
Wow. The <*> just saved me about a month of work. Truly appreciated. ~ Oak
As Seen On

BBC, Reader's Digest, PC Magazine, Today Show, Money Magazine
WELCOME TO TECH SUPPORT GUY!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.


(clock)
THIS THREAD HAS EXPIRED.
Are you having the same problem? We have volunteers ready to answer your question, but first you'll have to join for free. Need help getting started? Check out our Welcome Guide.

Search Tech Support Guy

Find the solution to your
computer problem!




Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


WELCOME
You Are Using: Server ID
Trusted Website Back to the Top ↑