One may face following error while copying data from one cluster to other, using Distcp
Command:
hadoop distcp -i {src} {tgt}
Error:
org.apache.hadoop.toolsCopyListing$DulicateFileException: File would cause duplicates.
Ideally there can't be same file names. So, what might be happening in your case is you trying to copy partitioned table from one cluster to other. And, 2 different named partitions have same file name.
Your solution is to correct Source path {src}
in your command, such that you provide path uptil partitioned sub directory, not the file.
For ex - Refer below :
/a/partcol=1/file1.txt
/a/partcol=2/file1.txt
- If you use
{src}
as"/a/*/*"
then you will get the error"File would cause duplicates."
- But, if you use
{src}
as"/a"
then you will not get error in copying.
Comments
Post a Comment