Monday, February 8, 2016

File compression and archiving in linux ?

INTRODUCTION TO FILE COMPRESSION AND ARCHIVING

20 useful tar and zip commands  It is useful to store a group of files in one file for easy backup, for transfer to another directory, or for transfer to another computer. It is also useful to compress large files; compressed files take up less disk space and download faster via the Internet.

It is important to understand the distinction between an archive file and a compressed file. An archive file is a collection of files and directories stored in one file. The archive file is not compressed — it uses the same amount of disk space as all the individual files and directories combined. A compressed file is a collection of files and directories that are stored in one file and stored in a way that uses less disk space than all the individual files and directories combined. If disk space is a concern, compress rarely-used files, or place all such files in a single archive file and compress it.

Note: tar file is not a compressed file, but compressed file is archived file
As we so many extensions to compress the files using tar command, as we take few examples in this article. All the extensions will work to compress the files and directories but there compression ratio is different compare to each other. Based extension compression ratio we can use different options.

1. gzip

2. bzip

3. zip

Syntax: tar <File Name.tar> <directory / file path>

1. ARCHIVING FILES USING TAR COMMAND

Archiving is not an compression of files and directories it's an kind of group all the files and directories together in single file, instead of multiple files. After creating an archive file, we can't see size difference in between actual file system size and archive file.

Let's see an example below

[root@Linuxforfreshers.com tar]# du -h *.txt   <<-- Files Size before creating an archive  44K     d.txt  44K     g.txt  44K     kumar.txt  44K     raghu.txt  44K     linux.txt  44K     test1.txt  44K     test2.txt  44K     test3.txt  44K     test4.txt
  [root@Linuxforfreshers.com tar]# tar -cvf raghu.tar *.txt   << to Create an Archive file command  d.txt  g.txt  kumar.txt  raghu.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt
  [root@Linuxforfreshers.com tar]# du -h raghu.tar  << -- After Creating an archive file size  380K    raghu.tar
explanation of tar command options

-c Create an archive file

-v verbose (display all files status to archive)

-f specifying the files

2. EXTRACTING AN ARCHIVE FILE

In order to extract the archive file we have to use -x option along with tar command

[root@Linuxforfreshers.com tar]# tar -xvf raghu.tar  d.txt  g.txt  kumar.txt  raghu.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt
3. UPDATING AN ARCHIVE FILE WITH NEWLY CREATED FILES

There is a requirement that, we have to update an archive file by adding only newly created files.  Adding only newly created files to archive will save us the lot of time.

Let's see an example as shown below, when we use -u option along with tar command it will update the tar file with newly created files

[root@Linuxforfreshers.com tar]# touch Linuxforfreshers.coms.txt 
[root@Linuxforfreshers.com tar]# tar -uvf raghu.tar *.txt  Linuxforfreshers.coms.txt
4. LIST FILES FROM ARCHIVE WITHOUT EXTRACTING THEM

all the times we know need to extract an archive in order to see the archive content, if it is an large file its very difficult to extract and it takes lot of time to extract and required disk space as well to extract the files.

We have to use '-t' option to see all files which are there in archive file

[root@Linuxforfreshers.com tar]# tar -tf raghu.tar  d.txt  g.txt  kumar.txt  raghu.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt  Linuxforfreshers.coms.txt
5. EXTRACT SINGLE FILE FROM ARCHIVE

This option is very handy whenever we have an large archive file, we need only single file from that archive to be restored. In order to restore an single file from archive we have to use wildcards

[root@Linuxforfreshers.com tar]# rm -rf *.txt  <<-- Deleted all the Files from current location  [root@Linuxforfreshers.com tar]# ls   << -- After Deletion we have below files  3  arkit10.doc  arkit1.doc  arkit2.doc  arkit3.doc  arkit5.doc  arkit6.doc  arkit7.doc  arkit8.doc  arkit9.doc  raghu.tar
  [root@Linuxforfreshers.com tar]# tar -xvf raghu.tar Linuxforfreshers.coms.txt   <<<-- Restored an single file from archive  Linuxforfreshers.coms.txt
 [root@Linuxforfreshers.com tar]# ls   <<-- After Restoration we have below files  3            arkit1.doc  arkit3.doc  arkit6.doc  arkit8.doc  raghu.tar  arkit10.doc  arkit2.doc  arkit5.doc  arkit7.doc  arkit9.doc  Linuxforfreshers.coms.txt
above is the example how we can restore a single from archive

6. EXTRACT MULTIPLE FILES FROM ARCHIVE (NOT ALL FILES)

As you see in 5th step we extracted single file from archive, in the same way we are going to extract an multiple files from archive (not all).

Note: in order to extract files from archive you have to know exact file names, you can use '-t' to see all the files in archive

[root@Linuxforfreshers.com tar]# rm -rf Linuxforfreshers.coms.txt   <<-- To get clarity deleted previous presented files
 [root@Linuxforfreshers.com tar]# tar -xvf raghu.tar "Linuxforfreshers.coms.txt" "test1.txt"  test1.txt  Linuxforfreshers.coms.txt
  [root@Linuxforfreshers.com tar]# ls  3            arkit1.doc  arkit3.doc  arkit6.doc  arkit8.doc  raghu.tar           test1.txt  arkit10.doc  arkit2.doc  arkit5.doc  arkit7.doc  arkit9.doc  Linuxforfreshers.coms.txt 
[root@Linuxforfreshers.com tar]# rm -rf Linuxforfreshers.coms.txt test1.txt  [root@Linuxforfreshers.com tar]# tar -xvf raghu.tar --wildcards *.txt  d.txt  g.txt  kumar.txt  raghu.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt  Linuxforfreshers.coms.txt   
Note:: As we deleting the previous files only for demonstration only, DO NOT DELETE FILES in your environment.

you can mention multiple file names and also we can use wildcard option to restore multiple files as shown above example

7. COMPRESSING FILES IN GZIP

As of now we see how to archive an files (grouping files together in single file). After creating an archive we did not get an space saving benefit because archive will not compress an files, file size will not decrease. When we compress an files we save disk space. If we want to create 'gzip' file with extension '.gz' we have to use '-z' option along with 'tar' command.

Let's see an example

[root@Linuxforfreshers.com tar]# tar -czvf linux.tar.gz *.txt  d.txt  g.txt  kumar.txt  raghu.txt  Linuxforfreshers.coms.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt  [root@Linuxforfreshers.com tar]# ls  3            arkit2.doc  arkit6.doc  arkit9.doc  kumar.txt  linux.tar.gz        test1.txt  test4.txt  arkit10.doc  arkit3.doc  arkit7.doc  d.txt       raghu.tar   Linuxforfreshers.coms.txt  test2.txt  arkit1.doc   arkit5.doc  arkit8.doc  g.txt       raghu.txt   linux.txt           test3.txt
 [root@Linuxforfreshers.com tar]# du -h linux.tar.gz  4.0K    linux.tar.gz  [root@Linuxforfreshers.com tar]# du -h *.txt  44K     d.txt  44K     g.txt  44K     kumar.txt  44K     raghu.txt  0       Linuxforfreshers.coms.txt  44K     linux.txt  44K     test1.txt  44K     test2.txt  44K     test3.txt  44K     test4.txt
 [root@Linuxforfreshers.com tar]#
As shown in above example, after compression of text files using '-z' we got an compression file size is 4KB actual file size 380KB

8. COMPRESSING FILES USING BZIP

Its also same like 'gzip' only but compression ratio of '.bz2′ is more compare to '.gz' we are going to compress same files as we used in above example and see how much we will get the compressed file size, for 'bzip' we have to use '-j' option.

[root@Linuxforfreshers.com tar]# tar -cjvf 1linux.tar.bz2 *.txt  d.txt  g.txt  kumar.txt  raghu.txt  Linuxforfreshers.coms.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt  [root@Linuxforfreshers.com tar]# du -h 1linux.tar.bz2  4.0K    1linux.tar.bz2
In this comparison of '.gz' and '.bz2' compression methods practical examples are below

9. COMPRESSION RATIO OF .GZ (GZIP) AND .BZ2 (BZIP)

After compressing 34MB using '.gz' output file size is 8.6MB.

Using same  files compressed with '.bz2' output file size is 7.2MB. Comparatively .bz2 compression ratio is higher than .gz

[root@Linuxforfreshers.com tar]# du -h tarr.tar.gz  8.6M    tarr.tar.gz  [root@Linuxforfreshers.com tar]# du -h tarr.tar.bz2  7.2M    tarr.tar.bz2
10. EXTRACTING COMPRESSED FILES FROM 'GZIP' AND 'BZIP'

To extract 'gzip' and 'bzip' files we have to use '-x' option along with there own options '-z' for gzip and '-j' for bzip.

Below is the example for extracting the 'bzip' file

[root@Linuxforfreshers.com tar]# tar -xjvf 1linux.tar.bz2  d.txt  g.txt  kumar.txt  raghu.txt  Linuxforfreshers.coms.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt
Below is the practical example for extracting the 'gzip' file

[root@Linuxforfreshers.com tar]# tar -xzvf linux.tar.gz  d.txt  g.txt  kumar.txt  raghu.txt  Linuxforfreshers.coms.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt  [root@Linuxforfreshers.com tar]#
11. ZIPPING THE FILES USING ZIP COMMAND

zip command is used to compress the files with .zip extension, zip is available in different platform's such as Unix, Linux, Windows and MAC.

Syntax:  zip <Destination File Path and Name>.zip  <source files to compress>

below is the example to compress the files using 'zip' command

[root@Linuxforfreshers.com tar]# zip docfiles.zip *.txt    adding: d.txt (deflated 100%)    adding: g.txt (deflated 100%)    adding: kumar.txt (deflated 100%)    adding: raghu.txt (deflated 100%)    adding: Linuxforfreshers.coms.txt (stored 0%)    adding: linux.txt (deflated 100%)    adding: test1.txt (deflated 100%)    adding: test2.txt (deflated 100%)    adding: test3.txt (deflated 100%)    adding: test4.txt (deflated 100%)
  [root@Linuxforfreshers.com tar]#
12. ZIPPING FILES AND DIRECTORIES ALONG WITH SUB DIRECTORIES AND ITS FILES

When we use remote directory compression using 'zip' command it will not compress all the sub directories and its content in order to compress all the sub directories and its files we have to use '-r' along with zip command

[root@Linuxforfreshers.com tar]# zip -r subdir.zip raghu/    adding: raghu/ (stored 0%)    adding: raghu/kumar/ (stored 0%)    adding: raghu/kumar/linux/ (stored 0%)    adding: raghu/kumar/linux/d.txt (deflated 100%)    adding: raghu/kumar/linux/g.txt (deflated 100%)    adding: raghu/kumar/linux/kumar.txt (deflated 100%)    adding: raghu/kumar/linux/raghu.txt (deflated 100%)
13. COMPRESSING WITH HIGH COMPRESSION RATIO

zip command has good feature that we can also mention an compression ratio option from 1 to 9. 9 gives high compression.

[root@Linuxforfreshers.com tar]# zip -9 -r deepcompress.zip raghu/    adding: raghu/ (stored 0%)    adding: raghu/kumar/ (stored 0%)    adding: raghu/kumar/linux/ (stored 0%)    adding: raghu/kumar/linux/d.txt (deflated 100%)    adding: raghu/kumar/linux/g.txt (deflated 100%)    adding: raghu/kumar/linux/kumar.txt (deflated 100%)    adding: raghu/kumar/linux/raghu.txt (deflated 100%)    adding: raghu/kumar/linux/Linuxforfreshers.coms.txt (stored 0%)    adding: raghu/kumar/linux/linux.txt (deflated 100%)    adding: raghu/kumar/linux/test1.txt (deflated 100%)    adding: raghu/kumar/linux/test2.txt (deflated 100%)    adding: raghu/kumar/linux/test3.txt (deflated 100%)    adding: raghu/kumar/linux/test4.txt (deflated 100%)
14. EXCLUDING PARTICULAR FILE / DIRECTORY FROM COMPRESSION

We can also exclude file from compression in order to do that '-x' we have to use.

[root@Linuxforfreshers.com tar]# zip -r compress1.zip raghu/ -x raghu/g.txt    adding: raghu/ (stored 0%)    adding: raghu/d.txt (deflated 100%)    adding: raghu/kumar.txt (deflated 100%)    adding: raghu/raghu.txt (deflated 100%)    adding: raghu/Linuxforfreshers.coms.txt (stored 0%)    adding: raghu/linux.txt (deflated 100%)    adding: raghu/test1.txt (deflated 100%)    adding: raghu/test2.txt (deflated 100%)    adding: raghu/test3.txt (deflated 100%)    adding: raghu/test4.txt (deflated 100%)  [root@Linuxforfreshers.com tar]# ls raghu/  d.txt  g.txt  kumar.txt  raghu.txt  Linuxforfreshers.coms.txt  linux.txt  test1.txt  test2.txt  test3.txt  test4.txt
15. DELETE PARTICULAR FILE FROM ZIP

We can also delete an file from compressed file using option '-d' along with zip command

[root@Linuxforfreshers.com tar]# zip -d compress1.zip raghu/linux.txt  deleting: raghu/linux.txt
16. UPDATE NEWLY CREATED FILES TO ZIP

We can update zip file using '-u' option which will only add newly created files to zip file.

[root@Linuxforfreshers.com tar]# touch Update2.txt  [root@Linuxforfreshers.com tar]# zip -u compress1.zip *.txt    adding: Update2.txt (stored 0%)  [root@Linuxforfreshers.com tar]#
17. UPDATE ZIP WITH NEWLY MODIFIED FILES

Update only modifed files to zip file, in order to do modified file update use '-fr' option

[root@Linuxforfreshers.com tar]# zip -fr compress1.zip *.txt  freshening: Update2.txt (stored 0%)  [root@Linuxforfreshers.com tar]#
18. LIST ALL FILES FROM ZIP WITHOUT EXTRACTING THEM

List all files from zip without extracting them

# less compress.zip
19. CHECK ZIP FILE CONTENT WITHOUT EXTRACTING

Without extracting zip file, if you want to see zipped file content you can see using 'zmore' and 'zless' commands.

# zmore compress.zip  # zless comress.zip
20. DE-COMPRESS ZIP FILE

In order to extract the zip file we have to use 'unzip' command. If files are exists it will ask you for the confirmation to re-write the same.


[root@Linuxforfreshers.com tar]# unzip compress1.zip  Archive:  compress1.zip  replace d.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y    inflating: d.txt  replace g.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y    inflating: g.txt  replace kumar.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: A    inflating: kumar.txt

No comments:

Post a Comment