Splitting can be done on any file type, including binary files, with the help of
split
utility of Solaris. Read the man page of split
, for all the available options. In the example, we will be using the following options to split one 11M file into several 2M files.Syntax:
split [-b nm] [file [name]]
where:
-b nm
suggests splitting a file
into pieces of size nMB, with the name
to be used for each of the files resulting from the split operation.split
will append something like "aa", "ab", etc., to make the resulting file names unique and in ascending sort order.eg., 1. Splitting a large file
% ls -lh *.zip
-rw-r--r-- 1 giri other 11M Jun 30 12:10 src.zip
% split -b 3m src.zip source
% ls -lh source*
-rw-rw-r-- 1 giri other 3.0M Jun 30 17:18 sourceaa
-rw-rw-r-- 1 giri other 3.0M Jun 30 17:18 sourceab
-rw-rw-r-- 1 giri other 3.0M Jun 30 17:18 sourceac
-rw-rw-r-- 1 giri other 2.3M Jun 30 17:18 sourcead
2. Re-creating the original file (assembling) from small files
Since the same file has been split into small files sequentially, we can use
cat
to merge the small files. If the number files to merge is small, then it is easy to use cat
and redirecting the output to a file.eg.,
- One file at a time
% cat sourceaa > CopyOfsrc.zip
cat sourceab >> CopyOfsrc.zip
cat sourceac >> CopyOfsrc.zip
cat sourcead >> CopyOfsrc.zip - All files at once
% cat source* > CopyOfsrc.zip
- Some problems with above mentioned approaches:
- will be inconvenient to redirect the output to a file, if the number of files are too many (approach #1)
- may fail, with
Arguments too long
error, depending on the shell, if the number of files are too many (approach #2)
- Workaround:
Use the combination ofcat
andxargs
xargs
reads a group of arguments from its standard input, then runs a UNIX command with that group of arguments. It keeps reading arguments and running the command until it runs out of arguments. So, it is very unlikely to hit the limitations of a shell andArguments too long
error
eg.,% ls source* | xargs cat > CopyOfsrc.zip
.
Sincels
lists all the small files in sorted order, we don't have to worry about merging the files in correct order.
- will be inconvenient to redirect the output to a file, if the number of files are too many (approach #1)
% ls -lh CopyOfsrc.zipVerify the checksum of both the files:
-rw-rw-r-- 1 giri other 11M Jun 30 17:57 CopyOfsrc.zip
% file CopyOfsrc.zip
CopyOfsrc.zip: ZIP archive
% cksum CopyOfsrc.zipNote:
1904556195 11875601 CopyOfsrc.zip
% cksum src.zip
1904556195 11875601 src.zip
The resulting files from
split
, will be stored in the same format as original. In our example, all sourcexy
files will be of type "zip archive". But since they are not complete, trying to extract the files results in an error.% file sourceaa
sourceaa: ZIP archive
% unzip sourceaa
Archive: sourceaa
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of sourceaa or
sourceaa.zip, and cannot find sourceaa.ZIP, period.
________________
Technorati tags: Solaris | OpenSolaris
Thanks a lot its really worth having a small session of split it worked for me really well. Keep the good work.
ReplyDelete