Need Utility For Complex Batch Replace Of Millions Of Files

Status
This thread has been Locked and is not open to further replies. Please start a New Thread if you're having a similar issue. View our Welcome Guide to learn how to use this site.

demimetacalf

Thread Starter
Joined
Dec 27, 2010
Messages
13
I have a couple of million .html files to change, so I need a batch way of doing them all at once. They change seems simple but it is turning out to be very difficult to actually get done right. Here is the situation. The files currently have this content:

<title>123 Product XYZ</title>
<meta name="description" content="Blah, blah, blah">

What I want is to end up with this content:

<title>123 Product XYZ</title>
<meta name="description" content="Blah, blah, blah 123 Product XYZ">

Can't figure out any way to do it. Please note that although I've been around computers since they were hacked from granite, I am not a programmer so I'm stuck. Any assistance would be appreciated! Happy holidays! :)
 

valis

Moderator
Joined
Sep 24, 2004
Messages
78,021
let me know if you don't get a response in the next 24 hours or so........if not, I'll flag down one of our resident coders. I'm also going to move this to business apps.

thanks,

v
 
Joined
Aug 7, 2007
Messages
9,028
In the html files you want to read the product name from the title tags, and add it to the end of the description line?

Take the part in red and add it as shown:
<title>123 Product XYZ</title>
<meta name="description" content="Blah, blah, blah">

<title>123 Product XYZ</title>
<meta name="description" content="Blah, blah, blah 123 Product XYZ">

Are these two lines always one right after the other, or could there be intervening lines?
Is there a following line that will always be present?
Can you zip up and attach one or two of the files, assuming it won't violate any privacy concerns?

To attach a file:
  • If using Quick Reply, click the Go Advanced button under the Reply box..
  • Click the Paperclip
    at the top of the editor window, or scroll down and click the Manage Attachments button in the Additional Options section (may have to expand it).
  • Click the Browse... button and browse to your file
  • Click the Upload button
  • Repeat for any more files, then close the Manage Attachments window
 

demimetacalf

Thread Starter
Joined
Dec 27, 2010
Messages
13
Thanks for the prompt reply. I'm attaching a simplified version of the html in the attachment. The <head> section is exactly as it is in the actual megafiles. You'll see that the two lines are attached and that is the way they are always present, with no intervening lines or characters. The next characters are also always present as is. I really appreciate your help! Thanks! :)
 

Attachments

Joined
Aug 7, 2007
Messages
9,028
Couple more questions:The sample has this:
<title> 123 Product XYZ </title>
There is a leading and trailing space there, do you want them removed when it's added to the description, or left as is?
And this part:
content="Blah Blah Blah."

is that ending period always present, and do you want the Product name before the period, or after the period?
 

demimetacalf

Thread Starter
Joined
Dec 27, 2010
Messages
13
We can just leave the spaces as they are and we can also leave the period where it is at the end of the Blah, so the product comes after the period, naturally with a space. Thanks again for your continued interest! :)
 
Joined
Aug 7, 2007
Messages
9,028
That makes it easier.
This will find all html files in the source folder, make the change, and write the new file in the destination folder.
The new file will have the same name and relative path. The original file will be unchanged.
Very easy to set it to overwrite the current file, but this is a bit safer for testing.
It is recursive, so will find all files in the Source folder and it's subfolders.
It will output every 5th file name to the title bar as a progress indicator. This does slow it down a bit, so you may want to increase that to a higher number (edit the number in blue), or it can be removed entirely.
Edit the two lines in red to point to the Source and Destination folder. The Destination folders will be created if they don't exist.
Do not set the source and destination to the same folder, it won't work (it will delete all the files). The program checks for that.

Give this a try with a few test files first of course.

Copy the text in the following code block into Notepad.
Save this someplace with a .cmd extension.
Be sure to change the Save as Type: box to All Files when saving.
Double click it to run the file.

Code:
@Echo Off
SetLocal EnableDelayedExpansion
[COLOR=Red]Set _Source=C:\Test Source
Set _Dest=C:\Test Dest[/COLOR]
If /I "%_Source%"=="%_Dest%" Echo.Error. Source and Destination are the Same&Pause&Goto :EOF
Set _VBFile=%temp%\SNR.vbs
Call :_MakeVBS
Set _Count=1
For /F "Tokens=* Delims=" %%I In ('Dir /A-D /B /S "%_Source%\*.html"') Do (
  Set _OutPath=%%~dpI
  Set _OutPath=!_OutPath:%_Source%=%_Dest%!
  Set /A _Count-=1
  If !_Count!==0 Title Processing %%I & Set _Count=[COLOR=Blue][B]5[/B][/COLOR]
  If Not Exist "!_OutPath!" md "!_OutPath!"
  cscript /nologo "%_VBFile%" "%%I" "!_OutPath!\%%~nxI"
)
Del "%_VBFile%"
Goto :EOF
:::::::::::::::::::::::::::::::::::::::::::::::::::
:: Make VBS File
:::::::::::::::::::::::::::::::::::::::::::::::::::
:_MakeVBS
(Echo.Const ForReading = 1
Echo.Const ForWriting = 2
Echo.
Echo.StrFileName = Wscript.Arguments.Item^(0^)
Echo.StrOutfName = Wscript.Arguments.Item^(1^)
Echo.Set objFSO = CreateObject^("Scripting.FileSystemObject"^)
Echo.' Delete output file if it exists
Echo.If objFSO.FileExists^(StrOutfName^) Then objFSO.DeleteFile^(StrOutfName^)
Echo.
Echo.' Set search pattern
Echo.Set objRegEx = CreateObject^("VBScript.RegExp"^)
Echo.objRegEx.IgnoreCase = True
Echo.objRegEx.Global = True
Echo.objRegEx.Pattern = "<meta name=""description"" content=(.+)"">(.*)<title>(\s*)(.+)</title>"
Echo.strRepPatrn = "<meta name=""description"" content=$1 $4"">$2<title>$3$4</title>"
Echo.' Open a file
Echo.Set objFile = objFSO.OpenTextFile^(StrFileName,ForReading^)
Echo.strContents = objFile.ReadAll
Echo.objFile.Close
Echo.strNewStr = objRegEx.Replace^(strContents, strRepPatrn^)
Echo.Set objOutputFile = objFSO.CreateTextFile^(StrOutfName^)
Echo.objOutputFile.WriteLine strNewStr
Echo.objOutputFile.Close)>"%_VBFile%"
 

valis

Moderator
Joined
Sep 24, 2004
Messages
78,021
TheOutcaste, mind if ask how you picked this up? Self-taught, picked up over a period of experimentation, or is there some bible out there that you refer to?

You continue to amaze, and I'll leave it at that.

cheers, and happy new year.
 

demimetacalf

Thread Starter
Joined
Dec 27, 2010
Messages
13
I'm totally astounded at that code. I haven't had a chance to work it, but I'll do it first thing in the morning and report back. All I can say is WOW! Thanks! :)
 

demimetacalf

Thread Starter
Joined
Dec 27, 2010
Messages
13
Awesome! I'm really looking forward to testing t the livin' daylights out of it in the morning and I'm pretty sure that there will be no ugly old 1970s AMC compact cars in there! :)
 
Status
This thread has been Locked and is not open to further replies. Please start a New Thread if you're having a similar issue. View our Welcome Guide to learn how to use this site.

Users Who Are Viewing This Thread (Users: 0, Guests: 1)

As Seen On
As Seen On...

Welcome to Tech Support Guy!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.

Join over 807,865 other people just like you!

Latest posts

Staff online

Top