Solved: csv file splitting and unique names

Status
This thread has been Locked and is not open to further replies. Please start a New Thread if you're having a similar issue. View our Welcome Guide to learn how to use this site.

amstro

Thread Starter
Joined
Feb 7, 2013
Messages
10
Hi friends, I need help with editing a csv file. I have multiple rows in a file and I want each row to be transfered to a separate csv file. For ex, if my csv file is as below
crj,ckjck,dbnch,dnj,ch
152,15,15,487
abc,def,ghi,jkl,abc,def,ghi,jkl
........ and so on

I want n files created where n is the number of lines in my csv files. Each csv file will have exactly one line from the source csv file. I found a code elsewhere in this forum (http://forums.techguy.org/software-development/1023949-solved-split-100000-line-csv.html). It does exactly what I need with one exception. It generates 5000 lines into one csv file, while I need a single line. This is an easy fix. I change the variable 5000 to 1 and I got the results. Worked like magic. However, I have another issue; the file name. It uses a standard file name and increments it by 1 and so all file names have the same file name with a number at the end that keeps incrementing. I cannot use this naming convention. I want each file to use the first word from that line as the file name. So in my case. The first file will have name crj.csv, the 2nd will have name 152.csv, the third abc.csv and so on.

I am not a programming guy and I need help with this task. I am very grateful for any help in advance. Thank you so much.
 

amstro

Thread Starter
Joined
Feb 7, 2013
Messages
10
Code:
@echo off setlocal ENABLEDELAYEDEXPANSION REM Edit this value to change the name of the file that needs splitting. Include the extension. SET BFN=BigFile.csv REM Edit this value to change the number of lines per file. SET LPF=5000 REM Edit this value to change the name of each short file. It will be followed by a number indicating where it is in the list. SET SFN=SplitFile REM Do not change beyond this line. SET SFX=%BFN:~-3% SET /A LineNum=0 SET /A FileNum=1 For /F "delims==" %%l in (%BFN%) Do ( SET /A LineNum+=1 echo %%l >> %SFN%!FileNum!.%SFX% if !LineNum! EQU !LPF! ( SET /A LineNum=0 SET /A FileNum+=1 ) ) endlocal Pause
Thanks again!
 

Squashman

Trusted Advisor
Joined
Apr 4, 2003
Messages
19,786
How about putting some CR\LF in there.
Code:
@echo off 
setlocal ENABLEDELAYEDEXPANSION 

REM Edit this value to change the name of the file that needs splitting. Include the extension. 
SET BFN=BigFile.csv 

REM Edit this value to change the number of lines per file. 
SET LPF=5000 

REM Edit this value to change the name of each short file. It will be followed by a number indicating where it is in the list. 
SET SFN=SplitFile 

REM Do not change beyond this line. 
SET SFX=%BFN:~-3% 
SET /A LineNum=0 
SET /A FileNum=1 
For /F "delims==" %%l in (%BFN%) Do ( 
	SET /A LineNum+=1 
	echo %%l >> %SFN%!FileNum!.%SFX% 
	if !LineNum! EQU !LPF! (
		SET /A LineNum=0 
		SET /A FileNum+=1
	)
)
endlocal
Pause
 

Squashman

Trusted Advisor
Joined
Apr 4, 2003
Messages
19,786
Can you guarantee that the first delimited column will never have the same information otherwise you will have files that will overwrite each other.

Don't even need most of this code if you are doing 1 line per output file. It is a much smaller and less complicated batch file as long as you don't have to keep track of possible file overwrites.
 

foxidrive

Banned
Joined
Oct 20, 2012
Messages
793
This should work. If your file.csv has a long filename with spaces etc then rename it for this.

Code:
@echo off
for /f "delims=" %%a in (file.csv) do (
for /f "delims=," %%b in ("%%a") do (
> "%%b.csv" echo %%a
)
)
If doesn't check for existing files, as Squashman has mentioned.
 

amstro

Thread Starter
Joined
Feb 7, 2013
Messages
10
Thanks for the reply Squashman. Yes, I can guarantee that the first word will never be repeated in any of the lines. I will try the code and update back the result.
 

amstro

Thread Starter
Joined
Feb 7, 2013
Messages
10
Thank you foxidrive. I will check and update back with the result. Thanks everyone for helping me out.
 

amstro

Thread Starter
Joined
Feb 7, 2013
Messages
10
Foxidrive, this worked like magic. I get exactly what I needed. I have to ask one more help. How can I make it so that the output csv file has the results in a column fashion and not in a row. For ex, Genius.csv file has the result as Genius,talent,smart. How can I edit your batch file so that the result is
Genius
talent
smart
So basically I will have a big csv file and then once I run the batch file, multiple csv files are created with the first word from each line as the file name and each single line as content of the resulting files and also the resulting files have the data in column format and not in a line

Thanks a billion for your help
 

foxidrive

Banned
Joined
Oct 20, 2012
Messages
793
This uses a second batch file called batch2.bat (see below).
It replaces every comma with a newline sequence

*) save the top batch file
*) save the second batch file and call it batch2.bat
*) put them both in the same folder with file.csv
*) run the first batch file.

See how it goes.


Code:
@echo off
for /f "delims=" %%a in (file.csv) do (
for /f "delims=," %%b in ("%%a") do (
> "a.tmp" echo %%a
call batch2.bat
del "a.tmp"
ren "b.tmp" "%%b.csv"
)
)
batch2.bat

Code:
@set @JScript=1/*
@echo off
setlocal
cscript //nologo //E:JScript "%~f0"
goto :eof
*/
// JScript
//
var ForReading= 1
var ForWriting = 2
var fso = new ActiveXObject("Scripting.FileSystemObject");
// paths must have doubled backslashes if used
// var input = fso.OpenTextFile("c:\\temp\\filein.txt", ForReading)
// var output = fso.OpenTextFile("c:\\temp\\fileout.txt", ForWriting, true)
var input = fso.OpenTextFile("a.tmp", ForReading)
var output = fso.OpenTextFile("b.tmp", ForWriting, true)
var data = input.Readall()
data = data.replace(/\x2c/gi, "\x0D\x0A")
output.Write(data)
input.Close()
output.Close()
 

Squashman

Trusted Advisor
Joined
Apr 4, 2003
Messages
19,786
You could in theory do this all with a pure batch sub-routine by calling the sub-routine with a call statement. The commas will work as the delimiter to the sub routine and you can reference them as %1 etc. You could use the shift statement to continue processing each argument passed to the sub routine and you can keep checking if %1 is not nul to keep processing the input.
 

foxidrive

Banned
Joined
Oct 20, 2012
Messages
793
Yes, Squashman, that's feasible. It does depend on the file content however because poison characters have to have special handling.

The method I used is impervious to any poison characters, but
a possible flaw in my method is that for an entry like this:
one,two,"three pears, and four boxes",five
it will also split after the 'three pears' where it is one double quoted entry.
 

amstro

Thread Starter
Joined
Feb 7, 2013
Messages
10
Foxidrive, that is awesome! This worked exactly like I wanted it to. I am soo so grateful for your help. Considering the amount of work that I would have to do without these batch files, I would have stayed in my lab straight for a whole month. Thanks a million. I do have one more task at hand. But I will give it a shot based on your batch files and report back. Thanks again for all your help.
 

amstro

Thread Starter
Joined
Feb 7, 2013
Messages
10
Foxidrive, I really need your help again. I have this file which is just one big long statement. Imagine a text file about 3 MB with all kinds of text and special characters and everything (it almost take 20 to 30 secs for notepad to open it). The guy never hit the enter key and so there is just one big line. I have to separate this long statement into several statements. A statement starts at the first character in the file and ends at a semicolon. The next line starts after the semicolon and ends at the next semi-colon, and then the next line starts and ends at the next semicolon and so on and on and finally the last line ends at the last semicolon. I tried using your 2 batch files but I get several different files :(
I just need one result file which has all the content from the original file but there should be several statements as opposed to the original one statement. Also note, that it is a .txt file
I hope you can help me out again and I promise not to bother you again (for a whole month). Thank you so much.
 

foxidrive

Banned
Joined
Oct 20, 2012
Messages
793
Try this:

call it split.bat and launch it like this: split "filein.txt" "fileout.txt"


Code:
@echo off
 >_.vbs echo set regex=new regexp
>>_.vbs echo regex.global=true
>>_.vbs echo regEx.IgnoreCase=True            
>>_.vbs echo regex.pattern=";"
>>_.vbs echo wscript.stdOut.write regex.replace(wscript.stdin.readall,";"+vbCRLF)
cscript /nologo _.vbs <"%~1" >"%~2"
del _.vbs
 
Status
This thread has been Locked and is not open to further replies. Please start a New Thread if you're having a similar issue. View our Welcome Guide to learn how to use this site.

Users Who Are Viewing This Thread (Users: 0, Guests: 1)

As Seen On
As Seen On...

Welcome to Tech Support Guy!

Are you looking for the solution to your computer problem? Join our site today to ask your question. This site is completely free -- paid for by advertisers and donations.

If you're not already familiar with forums, watch our Welcome Guide to get started.

Join over 807,865 other people just like you!

Latest posts

Staff online

Members online

Top