Access AWS S3 or HCP HS3 (Hitachi) using Hadoop or HDFS or Distcp

Create Credentials File for S3 Keys

hadoop credential create fs.s3a.access.key -value <Access_KEY> -provider localjceks://file/$HOME/aws-dev-keys.jceks

hadoop credential create fs.s3a.secret.key -value <Secret_KEY> -provider localjceks://file/$HOME/aws-dev-keys.jceks

Where -

<Access_KEY>- S3 access key

<Secret_KEY> - S3 secret key

Note -

this will create a file local file system, in home directory with name aws-dev-keys.jceks
Put this file to HDFS. For, distributed access.

To list the details execute below command-

hadoop credential list -provider localjceks://file/$HOME/aws-dev-keys.jceks

List files in S3 Bucket with hadoop Shell

hdfs dfs -Dhadoop.security.credential.provider.path=jceks://hdfs/myfilelocation/aws-dev-keys.jceks -ls s3a://s3bucketname/

hdfs dfs -Dfs.s3a.access.key=<Access_KEY> -Dfs.s3a.secret.key=<Secret_KEY> -ls s3a://aa-daas-ookla/

Note -

Similarly, other hadoop/ hdfs commands like -put, -get can be executed.

Use distcp to copy data from S3 to HDFS -

hadoop distcp -Dhadoop.security.credential.provider.path=jceks://myfilelocation/aws-dev-keys.jceks s3a://s3bucketname/mydir/tar.gz /hdfs/mydata/

Use distcp to copy data from HDFS to S3 -

hadoop distcp -Dhadoop.security.credential.provider.path=jceks://myfilelocation/aws-dev-keys.jceks /hdfs/mydata/tar.gz s3a://s3bucketname/mydir/

Refer - https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/s3-credential-providers.html

Note that by default it goes to AWS S3. But, say you have HCP S3 API or any other vendor S3 that you want to access then it can be done just by specifying one more property as below -

-D fs.s3a.endpoint=hcp.s3-compatible.tests3api.com

The default provider password is "none".

To read value of an alias from provider file. Open Spark-Shell and execute below commands -

val jceks_path="jceks://myfilelocation/aws-dev-keys.jceks"

val alias="fs.s3a.access.key"

val conf = spark.sparkContext.hadoopConfiguration

conf.set("hadoop.security.credential.provider.path", jceks_path)

val credential_raw = conf.getPassword(alias)

credential_raw.mkString

QueryDB

Search This Blog