Welcome to TSG!
You're on the right track ColleenR
Just need to use the
/R switch to tell
FindStr to use the search string as a regular expression, and don't use
/C:, as that specifies a literal string, not a regular expression.
And I find it easier to put the redirection at the start of the line, rather than in the middle:
Code:
>H:\DisposalDatePages.txt Findstr /I /R /M "\"ato_date_disposal\" content=\"..*\"" *.htm
If you need to verify that what is in the quotes after content= is actually a date, the following will get close.
There's limits to the length of a search string with Findstr, so it won't let me verify the full time format, just up to the 1st character of the seconds.
This will find a string that contains the following string:
"ato_date_disposal" content="dd MMM YYYY HH:MM:S
followed by a single character, then a
">
In other words it checks that the quotes after content contains a two digit date starting with 0-3, space, three letter month, space, 4 digit year starting with 1 or 2, space, a two digit hour starting with 0, 1, or 2, a colon, a two digit minute starting with 0-5, a colon, a number 0 through 5, and a single character
If there is no date in the content field it will not output the file name to the
DisposalDatePages.txt file (i.e.,you have content="" or content="Some Text")
This requires the date and time fields to always be 2 digits
Code:
@Echo Off
>Search.txt Echo.\"ato_date_disposal\" content=\"[0-3][0-9].[ADFJMNOS][aceopu][bcglnprtvy].[1-2][0-9][0-9][0-9].[0-2][0-9]:[0-5][0-9]:[0-5].\"^>
>H:\DisposalDatePages.txt Findstr /I /R /M /G:search.txt *.htm
Del Search.txt
If you need to verify that the last digit is a number, or the date, hour, or minutes could be a single digit, or you need to include the
meta name= part, you'd need to use VBScript with a Regular Expression. I can get to that later today if you really need it. I don't use it much so I'll have to look up just what is allowed in the RegEx pattern.
EDIT: Forgot to mention you also need to use
..* to check for characters inside the quotes, as
.* matches zero or more, so you need the first period to match a character.
HTH
Jerry