1) Table 1 has 100 records and table 2 has 10 records. If I do LEFT JOIN it is expected to get 100 records. But, 'm getting more records . How?
2) There are 2 data sets
Table 1
1,a
2,b
3,c
Table 2
1
2
I need to select all the records from Table 1 that are there in Table 2. Hot can I do that?
Solution 1 - Inner JOIN
Solution 2 - Sub Query (select * from Table 1 where ID in (select ID from Table 2))
Which one is better solution and why ?
3) If a table has too many partitions. Does it impact performance ? If Yes, how can we solve problem of too many partitions in Hive?
4) How will you solve "large number of small files" problem in Hive / Hadoop? What is the impact of having too many files?
5) How will you write a Pipe '|' character delimited file in Hive ?
6) Difference between ROW_NUMBER and RANK window function ?
7) I have a table as follows -
TABLE1
c1,c2,c3
1 ,2 ,3
How will I transpose column to rows? So, that output comes in a single column like
TABLE2
C,V
c1,1
c2,2
c3,3
8) What is the use of "Cluster By " clause in Hive SQL?
9) What is difference between inner & cross join?
10) How to remove duplicate records from a Table ?
11) I have a Table with 2 columns Department & Salary. I have to find Maximum Salary for each department. How will I achieve that ?
Table 1
Department, Salary
IT,7000
IT,3500
AD,200
IT,400
MG,5000
MG,45000
IT,30000
AD,20
12) I have a Table which has IN and OUT time of Employees. Like -
Table 1
Name,Time
A,10:00AM
B,08:00AM
C,05:00PM
A,4:00PM
C,09:00AM
B,06:00PM
I want to find out time difference between OUT and IN . So, that I know how much time every employee is spending in office. Result should be like -
Table 2
Name,Time
A, 4:00 PM - 10:00AM
B, 6:00PM - 8:00AM
C, 9:00AM - 5:00PM
How will I achieve it ? Can you think of any other solution to achieve same ?
13) Giving Hints in a SQL. If, I mention a Broadcast HINT in a SQL. Does it enforce that Job will have to do a broadcast or it is just a hint given to execution engine (MR, TEZ or Spark) that a broadcast join can be done?
14) I have data such that -
Table 1
C1,C2
A,V1
A,V2
A,V3
B,V2
B,V4
B,V1
I have to transpose data above from rows into columns. So, that data looks like -
Table 2
C1,C2,C3,C4
A,V1,V2,V3
B,V2,V4,V1
How will I do that ?
15) Non-Equi joins with Hive/ Spark?
2) There are 2 data sets
Table 1
1,a
2,b
3,c
Table 2
1
2
I need to select all the records from Table 1 that are there in Table 2. Hot can I do that?
Solution 1 - Inner JOIN
Solution 2 - Sub Query (select * from Table 1 where ID in (select ID from Table 2))
Which one is better solution and why ?
3) If a table has too many partitions. Does it impact performance ? If Yes, how can we solve problem of too many partitions in Hive?
4) How will you solve "large number of small files" problem in Hive / Hadoop? What is the impact of having too many files?
5) How will you write a Pipe '|' character delimited file in Hive ?
6) Difference between ROW_NUMBER and RANK window function ?
7) I have a table as follows -
TABLE1
c1,c2,c3
1 ,2 ,3
How will I transpose column to rows? So, that output comes in a single column like
TABLE2
C,V
c1,1
c2,2
c3,3
8) What is the use of "Cluster By " clause in Hive SQL?
9) What is difference between inner & cross join?
10) How to remove duplicate records from a Table ?
11) I have a Table with 2 columns Department & Salary. I have to find Maximum Salary for each department. How will I achieve that ?
Table 1
Department, Salary
IT,7000
IT,3500
AD,200
IT,400
MG,5000
MG,45000
IT,30000
AD,20
12) I have a Table which has IN and OUT time of Employees. Like -
Table 1
Name,Time
A,10:00AM
B,08:00AM
C,05:00PM
A,4:00PM
C,09:00AM
B,06:00PM
I want to find out time difference between OUT and IN . So, that I know how much time every employee is spending in office. Result should be like -
Table 2
Name,Time
A, 4:00 PM - 10:00AM
B, 6:00PM - 8:00AM
C, 9:00AM - 5:00PM
How will I achieve it ? Can you think of any other solution to achieve same ?
13) Giving Hints in a SQL. If, I mention a Broadcast HINT in a SQL. Does it enforce that Job will have to do a broadcast or it is just a hint given to execution engine (MR, TEZ or Spark) that a broadcast join can be done?
14) I have data such that -
Table 1
C1,C2
A,V1
A,V2
A,V3
B,V2
B,V4
B,V1
I have to transpose data above from rows into columns. So, that data looks like -
Table 2
C1,C2,C3,C4
A,V1,V2,V3
B,V2,V4,V1
How will I do that ?
15) Non-Equi joins with Hive/ Spark?
16)
Comments
Post a Comment