Skip to main content

Posts

Showing posts from May, 2023

CVE-2022-33891 Apache Spark Command Injection Vulnerability

  Please refer - https://spark.apache.org/security.html The command injection occurs because Spark checks the group membership of the user passed in the ?doAs parameter by using a raw Linux command. If an attacker is sending reverse shell commands using  ?doAs . There is also a high chance of granting apache spark server access to the attackers’ machine. Vulnerability description - The Apache Spark UI offers the possibility to enable ACLs via the configuration option spark.acls.enable. With an authentication filter, this checks whether a user has access permissions to view or modify the application. If ACLs are enabled, a code path in HttpSecurityFilter can allow someone to perform impersonation by providing an arbitrary user name. A malicious user might then be able to reach a permission check function that will ultimately build a Unix shell command based on their input, and execute it. This will result in arbitrary shell command execution as the user Spark is currently running as. V

Hive Metastore ER Diagram

 

Hadoop Distcp to HCP or AWS S3a leading to Error - com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed

  Running Hadoop Distcp to copy data from S3a resulted in  below error -  **com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target” Stack trace: com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114) ~[aws-java-sdk-core-1.11.280.jar!/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064) ~[aws-java-sdk-core-1.11.280.jar!/:?] To debug this error, turn SSL debug logging on   -Djavax.net.debug=all , or  -Djavax.net.debug=ssl Above parameters c

Spark Exception - Filtering is supported only on partition keys of type string

We got this exception while performing SQL on Hive Table with Partition Column as BIGINT. Ex -  select * from mytable where cast(rpt_date AS STRING) >= date_format(date_sub(current_date(),60),'yyyyMMdd')  Exception -  Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)   at java.lang.reflect.Method.invoke(Method.java:498)   at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:862)   ... 101 more Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string   at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by

PrestoDB (Trino) SQL Error - java.lang.UnsupportedOperationException: Storage schema reading not supported

  We faced following error while querying via Trino, on a Hive Table defined on top of AVRO file format.  Error -  java.lang.UnsupportedOperationException: Storage schema reading not supported The Solution is to set following property in Hive Metastore -  metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader

PrestoDB (Trino) SQL Error - ORC ACID file should have 6 columns, found Y

  We faced this error while querying Hive Table using Trino -  Error -  SQL Error [16777223]: Query failed (#20230505_155701_00194_n2vdp): ORC ACID file should have 6 columns, found 17 This was happening because Table being queried was Hive Managed Internal Table, which by default in CDP ( Cloudera ) distribution is ACID compliant.  Now, in order for a Hive Table to be ACID complaint -  The underlying file system should be ORC,  and there were a few a changes on ORC file structure like the root column should be a struct with 6 nested columns (which encloses the data and the type of operation). Something like below               struct<     operation: int,     originalTransaction: bigInt,     bucket: int,     rowId: bigInt,     currentTransaction: bigInt,      row: struct<...>      > For more ORC ACID related internals - please take a look here  https://orc.apache.org/docs/acid.html Now, problem in our case was that though Hive Table was declared Inte