Hello,
I got a problem with extracting specific phrases within a list of 500+ rows. All the rows have the similar pattern like the 6 rows below:
row1> We sell big and small blue widgets at
http://www.bluewidgetsdomain.com/
row2> Our website is
http://www.bluewidgetsdomain.com/
row3> We sell many kinds of widgets. Go to this site for green widgets at
http://www.green-widgets-domain.net/
row4> Our website is
http://www.green-widgets-domain.net/
row5> We sell widgets. Check out red widgets at
http://www.red-widgets-domain.org/
row6> Our website is
http://www.red-widgets-domain.org/
Qn 1) How can I extract the words bluewidgetsdomain, green-widgets-domain, red-widgets-domain from each row and delete the rest of the words
Qn 2) For the rows that have the phrase [widgets at], I want to extract all the words after [widgets at] so I can get a list of the domain names, how can I do it?
Qn 3) I want to extract all domains with ending with .com only. (example, in this example the
http://www.bluewidgetsdomain.com/ will be extracted)
Qn 4) I want to extract the words between [We sell] and [at]. (example, for row one, the extracted words will be [big and small blue widgets], for row 3 the extracted words will be [many kinds of widgets. Go to this site for green widgets], for row 5 the extracted words will be [widgets. Check out red widgets] )
Qn 5) If the domain have dashes, I want to remove the dashes. (example,
http://www.green-widgets-domain.net/ will become
http://www.greenwidgetsdomain.net/)
Qn 6) I want to remove all the slash at the end of the domains. (example,
http://www.green-widgets-domain.net/ will become
http://www.green-widgets-domain.net)
Qn 7) How do I delete all rows that start with [Our website]
I appreciate any help. Thanks in advance!