Recursively copy a directory (using cp, tar or rsync)

Revision: 22666

at June 7, 2010 20:12 by tm

Updated Code

# How does one make a recursive, identical copy of /source/ into /target/?

# I suppose you want to do that for archiving or duplicating something and want to preserve 
# "everything". That includes permissions, ownership, filetypes, timestamps etc.
# Additionally I assume, that you create /target in advance. This is not required for any
# of the methods, but I would have to make a case-distinction for every method depending on 
# whether the /target-directory exists or not (but see Note 3 at the very end). Check note 
# 2 for a remark about sparse files.

# There are multiple ways to do it. On a GNU system the simplest way is:

############
# METHOD 1: Using GNU cp
cp -a /source/. /target/

# Two remarks:
# (1) If you used "/source" (or "/source/") as first argument, you would end up with /target
# /source/... This is usually no problem though: simply cd into target, then mv everything 
# from ./source to . (either using mv twice, once with source/* and once with source/.* or 
# using it only once with "shopt -s dotglob" in front). Copying one level of directories to 
# many (like "source" in the example above) is much less annoying than copying one level to 
# few (and e.g. ending up with all stuff under "source" besides /target instead of inside 
# it).
# So if in doubt, use a syntax that copies "one level too many".
# However, it could be a problem if either "*" or ".*" expand to "very much entries" and 
# you get an error along the lines of "argument list too long". In that case you'd need to 
# work around the issus with a loop:

# NOTE: not a method to copy but how to clean up after copying "one level too much"
cd /target; shopt -s dotglob; for f in source/*; do mv "$f" .; done; rmdir source.
# Best thing of course is, to do it right in the first place as I showed above.

# (2) The -a-option to cp is a GNUism. But GNU tools are very widespread, so it's the first 
# method I show and the preferred one if it's available because of its simplicity. Note 
# that the often used "cp -R" does not preserve attributes: "cp -R is a broken -a" - even 
# with -p and especially if people suggest "-r" (lower case).
# YOU ALMOST NEVER WANT -R or -r!

# The second method is (pretty) universally available (and my personal favourite):

############
# METHOD 2: tar (available everywhere)
cd /source; tar cf - . | (cd /target && tar xf -)

# I can't think of a system where that would not work (though it's not POSIX, since POSIX 
# does not know tar...). Some notes as well:
# (1) the cd commands could be replaced by -C option to tar, which again is a GNUism.
# (2) the "&&" in the extraction-subshell makes sure you don't clobber your filesystem if
# /target is not there.

# The third (and last) method I show uses rsync. The advantage here is that it can be 
# "restarted from where it left" if interrupted (while cp or tar wouuld have to start from 
# the beginning). So it may be the preferred method for large amounts of data. The 
# disadvantage is, that rsync is far less readily available than tar or even GNU cp.

############
# METHOD 3: rsync (can be "re-started", mind the -H option!)
rsync -aH /source/ /target

# (many pepole like to add a "-v" (--verbose) to the options, to make rsync show, what it 
# does). The rsync "a" option is similar to the cp "a" option. They both stand for
# "archive" which is, what we want to do: preseve everything. For rsync however there is 
# one notable exception: hard links. That's why we throw in the additional "H". Also note 
# the trailing "/" on "/source/": It tells rsync to copy the contents of "/source", not 
# "/source" itself. Omitting the trailing slash we'd end up with "/target/source/..." - not 
# a desaster, as I noted above.

Some final notes:

# (1) there sure are other ways. I can think of cpio and pax (which nobody knows but is 
#     POSIX...)
# (2) If sparse files are an issue check the -S parameter for rsync/(GNU)tar or --sparse
#     for (GNU) cp
# (3) to copy "everything but some subdirectories" you need the respective exclude-options 
#     for tar/rsync. cp cannot do that, although by using bash extended globbing, trivial
#     use-cases can be covered [Eample.: shopt -s exglob; echo /(!tmp)]
# (4) ok, I didn't want to...but: If /target does not exists (or is a file instead of a 
#     directory) the "cp" method does not change. It errors if /target exists but is not a
#     directory and creates the directory if no /target exists. The same goes for rsync. 
#     The tar method requires a container in advance to "untar" into.

Revision: 22665

at January 18, 2010 21:19 by tm

Updated Code

# How does one make a recursive, identical copy of /source/ into /target/?

# I suppose you want to do that for archiving or duplicating something and want to preserve 
# "everything". That includes permissions, ownership, filetypes, timestamps etc.
# Additionally I assume, that you create /target in advance. This is not required for any
# of the methods, but I would have to make a case-distinction for every method depending on 
# whether the /target-directory exists or not (but see Note 3 at the very end). Check note 
# 2 for a remark about sparse files.

# There are multiple ways to do it. On a GNU system the simplest way is:

############
# METHOD 1: Using GNU cp
cp -a /source/. /target/

# Two remarks:
# (1) If you used "/source" (or "/source/") as first argument, you would end up with /target
# /source/... This is usually no problem though: simply cd into target, then mv everything 
# from ./source to . (either using mv twice, once with source/* and once with source/.* or 
# using it only once with "shopt -s dotglob" in front). Copying one level of directories to 
# many (like "source" in the example above) is much less annoying than copying one level to 
# few (and e.g. ending up with all stuff under "source" besides /target instead of inside 
# it).
# So if in doubt, use a syntax that copies "one level too many".
# However, it could be a problem if either "*" or ".*" expand to "very much entries" and 
# you get an error along the lines of "argument list too long". In that case you'd need to 
# work around the issus with a loop:

# NOTE: not a method to copy but how to clean up after copying "one level too much"
cd /target; shopt -s dotglob; for f in source/*; do mv "$f" .; done; rmdir source.
# Best thing of course is, to do it right in the first place as I showed above.

# (2) The -a-option to cp is a GNUism. But GNU tools are very widespread, so it's the first 
# method I show and the preferred one if it's available because of its simplicity. Note 
# that the often used "cp -R" does not preserve attributes: "cp -R is a broken -a" - even 
# with -p and especially if people suggest "-r" (lower case).
# YOU ALMOST NEVER WANT -R or -r!

# The second method is (pretty) universally available (and my personal favourite):

############
# METHOD 2: tar (available everywhere)
cd /source; tar cf - . | (cd /target && tar xf -)

# I can't think of a system where that would not work (though it's not POSIX, since POSIX 
# does not know tar...). Some notes as well:
# (1) the cd commands could be replaced by -C option to tar, which again is a GNUism.
# (2) the "&&" in the extraction-subshell makes sure you don't clobber your filesystem if
# /target is not there.

# The third (and last) method I show uses rsync. The advantage here is that it can be 
# "restarted from where it left" if interrupted (while cp or tar wouuld have to start from 
# the beginning). So it may be the preferred method for large amounts of data. The 
# disadvantage is, that rsync is far less readily available than tar or even GNU cp.

############
# METHOD 3: rsync (can be "re-started", mind the -H option!)
rsync -aH /source/ /target

# (many pepole like to add a "-v" (--verbose) to the options, to make rsync show, what it 
# does). The rsync "a" option is similar to the cp "a" option. They both stand for
# "archive" which is, what we want to do: preseve everything. For rsync however there is 
# one notable exception: hard links. That's why we throw in the additional "H". Also note 
# the trailing "/" on "/source/": It tells rsync to copy the contents of "/source", not 
# "/source" itself. Omitting the trailing slash we'd end up with "/target/source/..." - not 
# a desaster, as I noted above.

Some final notes:

# (1) there sure are other ways. I can think of cpio and pax (which nobody knows but is 
#     POSIX...)
# (2) If sparse files are an issue check the -S parameter for rsync/(GNU)tar or --sparse
#     for (GNU) cp
# (3) to copy "everything but some subdirectories" you need the respective exclude-options 
#     for tar/rsync. cp cannot do that.
# (4) ok, I didn't want to...but: If /target does not exists (or is a file instead of a 
#     directory) the "cp" method does not change. It errors if /target exists but is not a
#     directory and creates the directory if no /target exists. The same goes for rsync. 
#     The tar method requires a container in advance to "untar" into.

Revision: 22664

at January 18, 2010 21:17 by tm

Updated Code

# How does one make a recursive, identical copy of /source/ into /target/?

# I suppose you want to do that for archiving or duplicating something and want to preserve 
# "everything". That includes permissions, ownership, filetypes, timestamps etc.
# Additionally I assume, that you create /target in advance. This is not required for any
# of the methods, but I would have to make a case-distinction for every method depending on 
# whether the /target-directory exists or not (but see Note 3 at the very end). Check note 
# 2 for a remark about sparse files.

# There are multiple ways to do it. On a GNU system the simplest way is:
# METHOD 1: Using GNU cp
cp -a /source/. /target/

# Two remarks:
# (1) If you used "/source" (or "/source/") as first argument, you would end up with /target
# /source/... This is usually no problem though: simply cd into target, then mv everything 
# from ./source to . (either using mv twice, once with source/* and once with source/.* or 
# using it only once with "shopt -s dotglob" in front). Copying one level of directories to 
# many (like "source" in the example above) is much less annoying than copying one level to 
# few (and e.g. ending up with all stuff under "source" besides /target instead of inside 
# it).
# So if in doubt, use a syntax that copies "one level too many".
# However, it could be a problem if either "*" or ".*" expand to "very much entries" and 
# you get an error along the lines of "argument list too long". In that case you'd need to 
# work around the issus with a loop:

# NOTE: not a method to copy but how to clean up after copying "one level too much"
cd /target; shopt -s dotglob; for f in source/*; do mv "$f" .; done; rmdir source.
# Best thing of course is, to do it right in the first place as I showed above.

# (2) The -a-option to cp is a GNUism. But GNU tools are very widespread, so it's the first 
# method I show and the preferred one if it's available because of its simplicity. Note 
# that the often used "cp -R" does not preserve attributes: "cp -R is a broken -a" - even 
# with -p and especially if people suggest "-r" (lower case).
# YOU ALMOST NEVER WANT -R or -r!

# The second method is (pretty) universally available (and my personal favourite):
# METHOD 2: tar (available everywhere)
cd /source; tar cf - . | (cd /target && tar xf -)

# I can't think of a system where that would not work (though it's not POSIX, since POSIX 
# does not know tar...). Some notes as well:
# (1) the cd commands could be replaced by -C option to tar, which again is a GNUism.
# (2) the "&&" in the extraction-subshell makes sure you don't clobber your filesystem if
# /target is not there.

# The third (and last) method I show uses rsync. The advantage here is that it can be 
# "restarted from where it left" if interrupted (while cp or tar wouuld have to start from 
# the beginning). So it may be the preferred method for large amounts of data. The 
# disadvantage is, that rsync is far less readily available than tar or even GNU cp.
# METHOD 3: rsync (can be "re-started", mind the -H option!)
rsync -aH /source/ /target

# (many pepole like to add a "-v" (--verbose) to the options, to make rsync show, what it 
# does). The rsync "a" option is similar to the cp "a" option. They both stand for
# "archive" which is, what we want to do: preseve everything. For rsync however there is 
# one notable exception: hard links. That's why we throw in the additional "H". Also note 
# the trailing "/" on "/source/": It tells rsync to copy the contents of "/source", not 
# "/source" itself. Omitting the trailing slash we'd end up with "/target/source/..." - not 
# a desaster, as I noted above.

Some final notes:

# (1) there sure are other ways. I can think of cpio and pax (which nobody knows but is 
#     POSIX...)
# (2) If sparse files are an issue check the -S parameter for rsync/(GNU)tar or --sparse
#     for (GNU) cp
# (3) to copy "everything but some subdirectories" you need the respective exclude-options 
#     for tar/rsync. cp cannot do that.
# (4) ok, I didn't want to...but: If /target does not exists (or is a file instead of a 
#     directory) the "cp" method does not change. It errors if /target exists but is not a
#     directory and creates the directory if no /target exists. The same goes for rsync. 
#     The tar method requires a container in advance to "untar" into.

Revision: 22663

at January 18, 2010 21:11 by tm

Initial Code

# How does one make a recursive, identical copy of /source/ into /target/?

# I suppose you want to do that for archiving or duplicating something and want to preserve 
# "everything". That includes permissions, ownership, filetypes, timestamps etc.
# Additionally I assume, that you create /target in advance. This is not required for any
# of the methods, but I would have to make a case-distinction for every method depending on 
# whether the /target-directory exists or not (but see Note 3 at the very end). Check note 
# 2 for a remark about sparse files.

# There are multiple ways to do it. On a GNU system the simplest way is:
# METHOD 1: Using GNU cp
cp -a /source/. /target/

# Two remarks:
# (1) If you used "/source" (or "/source/") as first argument, you would end up with /target
# /source/... This is usually no problem though: simply cd into target, then mv everything 
# from ./source to . (either using mv twice, once with source/* and once with source/.* or 
# using it only once with "shopt -s dotglob" in front). Copying one level of directories to 
# many (like "source" in the example above) is much less annoying than copying one level to 
# few (and e.g. ending up with all stuff under "source" besides /target instead of inside 
# it).
# So if in doubt, use a syntax that copies "one level too many".
# However, it could be a problem if either "*" or ".*" expand to "very much entries" and 
# you get an error along the lines of "argument list too long". In that case you'd need to 
# work around the issus with a loop:

# NOTE: not a method to copy but how to clean up after copying "one level too much"
cd /target; shopt -s dotglob; for f in source/*; do mv "$f" .; done; rmdir source.
# Best thing of course is, to do it right in the first place as I showed above.

# (2) The -a-option to cp is a GNUism. But GNU tools are very widespread, so it's the first 
# method I show and the preferred one if it's available because of its simplicity. Note 
# that the often used "cp -R" does not preserve attributes: "cp -R is a broken -a" - even 
# with -p and especially if people suggest "-r" (lower case).
# YOU ALMOST NEVER WANT -R or -r!

# The second method is (pretty) universally available (and my personal favourite):
# METHOD 2: tar (available everywhere)
cd /source; tar cf - . | (cd /target && tar xf -)

# I can't think of a system where that would not work (though it's not POSIX, since POSIX 
# does not know tar...). Some notes as well:
# (1) the cd commands could be replaced by -C option to tar, which again is a GNUism. (2) 
# the "&&" in the extraction-subshell makes sure you don't clobber your filesystem if 
# /target is not there.

# The third (and last) method I show uses rsync. The advantage here is, that it can be 
# "restarted from where it left" if interrupted (while cp or tar wouuld have to start from 
# the beginning). So it may be the preferred method for large amounts of data. The 
# disadvantage is, that rsync is far less readily available than tar or even GNU cp.
# METHOD 3: rsync (can be "re-started", mind the -H option!)
rsync -aH /source/ /target

# (many pepole like to add a "-v" (--verbose) to the options, to make rsync show, what it 
# does). The rsync "a" option is similar to the cp "a" option. They both stand for
# "archive" which is, what we want to do: preseve everything. For rsync however there is 
# one notable exception: hard links. That's why we throw in the additional "H". Also note 
# the trailing "/" on "/source/": It tells rsync to copy the contents of "/source", not 
# "/source" itself. Omitting the trailing slash we'd end up with "/target/source/..." - not # a desaster, as I noted above.

Some final notes:

# (1) there sure are other ways. I can think of cpio and pax (which nobody knows but is POSIX...)
# (2) If sparse files are an issue check the -S parameter for rsync/(GNU)tar or --sparse
for (GNU) cp
# (3) to copy "everything but some subdirectories" you need the respective exclude-options for tar/rsync. cp cannot do that.
# (4) ok, I didn't want to...but: If /target does not exists (or is a file instead of a 
#     directory) the "cp" method does not change. It errors if /target exists but is not a
#     directory and creates the directory if no /target exists. The same goes for rsync. The 
#     tar method requires a container in advance to "untar" into.

Initial URL

Initial Description

Initial Title

Recursively copy a directory (using cp, tar or rsync)

Initial Tags

Bash, copy

Initial Language

Bash

Choose a language for easy browsing: