Skip to main content

Posts

Showing posts from May, 2023

CVE-2022-33891 Apache Spark Command Injection Vulnerability

  Please refer - https://spark.apache.org/security.html The command injection occurs because Spark checks the group membership of the user passed in the ?doAs parameter by using a raw Linux command. If an attacker is sending reverse shell commands using  ?doAs . There is also a high chance of granting apache spark server access to the attackers’ machine. Vulnerability description - The Apache Spark UI offers the possibility to enable ACLs via the configuration option spark.acls.enable. With an authentication filter, this checks whether a user has access permissions to view or modify the application. If ACLs are enabled, a code path in HttpSecurityFilter can allow someone to perform impersonation by providing an arbitrary user name. A malicious user might then be able to reach a permission check function that will ultimately build a Unix shell command based on their input, and execute it. This will result in arbitrary shell command execution as the user Spark is currently...

Hive Metastore ER Diagram

 

Hadoop Distcp to HCP or AWS S3a leading to Error - com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed

  Running Hadoop Distcp to copy data from S3a resulted in  below error -  **com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target” Stack trace: com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114) ~[aws-java-sdk-core-1.11.280.jar!/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064) ~[aws-java-sdk-core-1.11.280.jar!/:?] To debug this error, turn SSL debug logging on   -Djavax.net.debug=all , or  -Djava...

Spark Exception - Filtering is supported only on partition keys of type string

We got this exception while performing SQL on Hive Table with Partition Column as BIGINT. Ex -  select * from mytable where cast(rpt_date AS STRING) >= date_format(date_sub(current_date(),60),'yyyyMMdd')  Exception -  Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)   at java.lang.reflect.Method.invoke(Method.java:498)   at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:862)   ... 101 more Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string   at org.apache.hadoop.hive.metastore....

PrestoDB (Trino) SQL Error - java.lang.UnsupportedOperationException: Storage schema reading not supported

  We faced following error while querying via Trino, on a Hive Table defined on top of AVRO file format.  Error -  java.lang.UnsupportedOperationException: Storage schema reading not supported The Solution is to set following property in Hive Metastore -  metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader

PrestoDB (Trino) SQL Error - ORC ACID file should have 6 columns, found Y

  We faced this error while querying Hive Table using Trino -  Error -  SQL Error [16777223]: Query failed (#20230505_155701_00194_n2vdp): ORC ACID file should have 6 columns, found 17 This was happening because Table being queried was Hive Managed Internal Table, which by default in CDP ( Cloudera ) distribution is ACID compliant.  Now, in order for a Hive Table to be ACID complaint -  The underlying file system should be ORC,  and there were a few a changes on ORC file structure like the root column should be a struct with 6 nested columns (which encloses the data and the type of operation). Something like below               struct<     operation: int,     originalTransaction: bigInt,     bucket: int,     rowId: bigInt,     currentTransaction: bigInt,      row: struct<...>  ...