QueryDB

Posts

Showing posts from May, 2023

CVE-2022-33891 Apache Spark Command Injection Vulnerability

Please refer - https://spark.apache.org/security.html The command injection occurs because Spark checks the group membership of the user passed in the ?doAs parameter by using a raw Linux command. If an attacker is sending reverse shell commands using ?doAs . There is also a high chance of granting apache spark server access to the attackers’ machine. Vulnerability description - The Apache Spark UI offers the possibility to enable ACLs via the configuration option spark.acls.enable. With an authentication filter, this checks whether a user has access permissions to view or modify the application. If ACLs are enabled, a code path in HttpSecurityFilter can allow someone to perform impersonation by providing an arbitrary user name. A malicious user might then be able to reach a permission check function that will ultimately build a Unix shell command based on their input, and execute it. This will result in arbitrary shell command execution as the user Spark is currently running as. V

Hive Metastore ER Diagram

Hadoop Distcp to HCP or AWS S3a leading to Error - com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed

Running Hadoop Distcp to copy data from S3a resulted in below error - **com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target” Stack trace: com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114) ~[aws-java-sdk-core-1.11.280.jar!/:?] at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064) ~[aws-java-sdk-core-1.11.280.jar!/:?] To debug this error, turn SSL debug logging on -Djavax.net.debug=all , or -Djavax.net.debug=ssl Above parameters c

Spark Exception - Filtering is supported only on partition keys of type string

We got this exception while performing SQL on Hive Table with Partition Column as BIGINT. Ex - select * from mytable where cast(rpt_date AS STRING) >= date_format(date_sub(current_date(),60),'yyyyMMdd') Exception - Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:862) ... 101 more Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by

PrestoDB (Trino) SQL Error - java.lang.UnsupportedOperationException: Storage schema reading not supported

We faced following error while querying via Trino, on a Hive Table defined on top of AVRO file format. Error - java.lang.UnsupportedOperationException: Storage schema reading not supported The Solution is to set following property in Hive Metastore - metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader

PrestoDB (Trino) SQL Error - ORC ACID file should have 6 columns, found Y

We faced this error while querying Hive Table using Trino - Error - SQL Error [16777223]: Query failed (#20230505_155701_00194_n2vdp): ORC ACID file should have 6 columns, found 17 This was happening because Table being queried was Hive Managed Internal Table, which by default in CDP ( Cloudera ) distribution is ACID compliant. Now, in order for a Hive Table to be ACID complaint - The underlying file system should be ORC, and there were a few a changes on ORC file structure like the root column should be a struct with 6 nested columns (which encloses the data and the type of operation). Something like below struct< operation: int, originalTransaction: bigInt, bucket: int, rowId: bigInt, currentTransaction: bigInt, row: struct<...> > For more ORC ACID related internals - please take a look here https://orc.apache.org/docs/acid.html Now, problem in our case was that though Hive Table was declared Inte