Create Credentials File for S3 Keys
hadoop credential create fs.s3a.access.key -value <Access_KEY> -provider localjceks://file/$HOME/aws-dev-keys.jceks
hadoop credential create fs.s3a.secret.key -value <Secret_KEY> -provider localjceks://file/$HOME/aws-dev-keys.jceks
Where -
<Access_KEY>- S3 access key
<Secret_KEY> - S3 secret key
Note -
- this will create a file local file system, in home directory with name aws-dev-keys.jceks
- Put this file to HDFS. For, distributed access.
To list the details execute below command-
hadoop credential list -provider localjceks://file/$HOME/aws-dev-keys.jceks
List files in S3 Bucket with hadoop Shell
hdfs dfs -Dhadoop.security.credential.provider.path=jceks://hdfs/myfilelocation/aws-dev-keys.jceks -ls s3a://s3bucketname/
hdfs dfs -Dfs.s3a.access.key=<Access_KEY> -Dfs.s3a.secret.key=<Secret_KEY> -ls s3a://aa-daas-ookla/
Note -
- Similarly, other hadoop/ hdfs commands like -put, -get can be executed.
Use distcp to copy data from S3 to HDFS -
hadoop distcp -Dhadoop.security.credential.provider.path=jceks://myfilelocation/aws-dev-keys.jceks s3a://s3bucketname/mydir/tar.gz /hdfs/mydata/
Use distcp to copy data from HDFS to S3 -
Refer - https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/s3-credential-providers.html
Note that by default it goes to AWS S3. But, say you have HCP S3 API or any other vendor S3 that you want to access then it can be done just by specifying one more property as below -
-D fs.s3a.endpoint=hcp.s3-compatible.tests3api.com
The default provider password is "none".
To read value of an alias from provider file. Open Spark-Shell and execute below commands -
val jceks_path="jceks://myfilelocation/aws-dev-keys.jceks"
val alias="fs.s3a.access.key"
val conf = spark.sparkContext.hadoopConfiguration
conf.set("hadoop.security.credential.provider.path", jceks_path)
val credential_raw = conf.getPassword(alias)
credential_raw.mkString
Comments
Post a Comment