Hive Subqueries
Creation of Employee table:-
hive> create table employeetab(empid string,fullname string,ssn string,department string,startdate string,salary int)
> row format delimited
> fields terminated by '\t'
> stored as textfile
> TBLPROPERTIES("skip.header.line.count"="1");
OK
To check the schema:-
hive> desc employeetab;
OK
empid string
fullname string
ssn string
department string
startdate string
salary int
Time taken: 0.235 seconds, Fetched: 6 row(s)
Download employee dataset
Insert the employee data into employeetab:-
hive> load data local inpath '/home/hduser/HIVE/employee.txt' into table employeetab;
Loading data to table hivedb.employeetab
OK
Time taken: 0.998 seconds
To check the Top 20 records:-
hive> select * from employeetab limit 20;
OK
EMP001 Faith K. Macias 845-04-3962 Marketing 27-01-2008 73500
EMP002 Lucian Q. Franklin 345-28-4935 IT/IS 01-03-2008 80000
EMP003 Blaze V. Bridges 503-53-8350 Marketing 16-04-2008 95000
EMP004 Denton Q. Dale 858-39-7967 Marketing 03-05-2008 105000
EMP005 Blossom K. Fox 245-18-5890 Engineering 11-07-2008 90000
EMP006 Kerry V. David 873-45-8675 Finance 17-07-2008 60000
EMP007 Melanie X. Baker 190-08-3679 Finance 05-10-2008 87000
EMP008 Adele M. Fulton 352-36-9553 Engineering 28-10-2008 104000
EMP009 Justina O. Jensen 645-74-0451 Marketing 05-11-2008 380050
EMP010 Yoshi J. England 558-53-1475 Marketing 09-12-2008 93000
EMP011 Brooke Y. Mccarty 129-42-6148 IT/IS 12-02-2009 180000
EMP012 Kay G. Colon 796-50-4767 Marketing 19-03-2009 100000
EMP013 Callie I. Forbes 266-48-1339 Human Resources 13-04-2009 136000
EMP014 Zachery O. Mann 663-00-3285 Marketing 28-04-2009 68000
EMP015 Lydia W. Nielsen 362-48-7035 Customer Support 13-05-2009 100000
EMP016 Alika C. Crawford 608-45-0899 Engineering 02-06-2009 144000
EMP017 Sheila D. Stephens 728-05-3317 Engineering 18-07-2009 84000
EMP018 Phelan L. Stuart 606-53-6042 Human Resources 24-08-2009 90000
EMP019 Cora X. Hardy 978-88-8706 Human Resources 05-09-2009 62000
EMP020 Reece E. Gross 577-84-8195 IT/IS 11-09-2009 120000
Find the records of an Employee having Top 20 salaries:-
hive> select * from employeetab sort by salary desc limit 20;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170128151609_a266ca7f-dcb4-4824-8d50-b14c7b7ed933
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0002, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0002/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-28 15:16:34,057 Stage-1 map = 0%, reduce = 0%
2017-01-28 15:16:55,569 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.25 sec
2017-01-28 15:17:19,433 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.77 sec
MapReduce Total cumulative CPU time: 9 seconds 770 msec
Ended Job = job_1485637695760_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 9.77 sec HDFS Read: 12010 HDFS Write: 3942 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 770 msec
OK
EMP050 Yen R. Wilder 769-98-1072 Human Resources 20-09-2012 881612
EMP043 Dean B. Tillman 149-06-0226 Human Resources 17-12-2011 807383
EMP041 Salvador I. Raymond 671-72-4788 Customer Support 27-09-2011 579621
EMP044 Caldwell T. Parks 707-97-2664 Marketing 30-01-2012 567399
EMP042 Keaton A. Gonzalez 590-90-9394 Marketing 05-10-2011 519354
EMP049 Martina T. Gilmore 861-73-6382 Finance 08-08-2012 451109
EMP009 Justina O. Jensen 645-74-0451 Marketing 05-11-2008 380050
EMP046 Ali T. Herman 143-56-8793 Engineering 27-02-2012 337522
EMP023 Galvin G. Christian 875-69-5292 Finance 14-11-2009 250500
EMP048 Wesley M. Luna 822-33-2521 Human Resources 06-07-2012 191860
EMP011 Brooke Y. Mccarty 129-42-6148 IT/IS 12-02-2009 180000
EMP028 Eric L. Hendrix 202-73-2253 Finance 19-10-2010 160000
EMP038 Georgia S. Lee 335-51-9127 Sales 17-07-2011 150200
EMP026 Benjamin R. Russo 448-36-7921 Marketing 19-04-2010 150000
EMP016 Alika C. Crawford 608-45-0899 Engineering 02-06-2009 144000
EMP013 Callie I. Forbes 266-48-1339 Human Resources 13-04-2009 136000
EMP029 Palmer V. Mitchell 885-23-8208 Marketing 20-10-2010 130000
EMP035 Lynn V. Ward 881-82-9219 Engineering 21-03-2011 124000
EMP020 Reece E. Gross 577-84-8195 IT/IS 11-09-2009 120000
EMP021 Casey T. Kim 855-03-2966 Engineering 25-09-2009 110000
Find the Maximum salary of an employee:-
hive> select max(salary) from employeetab;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170128171820_6ef50a73-7926-4d2a-b2a0-5dbf3793f634
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0011, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0011/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-28 17:18:47,753 Stage-1 map = 0%, reduce = 0%
2017-01-28 17:19:09,197 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.32 sec
2017-01-28 17:19:30,234 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.09 sec
MapReduce Total cumulative CPU time: 8 seconds 90 msec
Ended Job = job_1485637695760_0011
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.09 sec HDFS Read: 11844 HDFS Write: 106 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 90 msec
OK
881612
Time taken: 72.391 seconds, Fetched: 1 row(s)
Find the total Count of the records:-
hive> select count(*) from employeetab;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170128172014_2ba9f37c-e297-4a94-969f-effc5a154a71
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0012, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0012/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-28 17:20:43,459 Stage-1 map = 0%, reduce = 0%
2017-01-28 17:21:03,453 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.39 sec
2017-01-28 17:21:25,606 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.94 sec
MapReduce Total cumulative CPU time: 7 seconds 940 msec
Ended Job = job_1485637695760_0012
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.94 sec HDFS Read: 11615 HDFS Write: 102 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 940 msec
OK
50
Time taken: 73.24 seconds, Fetched: 1 row(s)
Find the record of an Employee having Highest salary:-
hive> select empid,fullname,salary from employeetab sort by salary desc limit 1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170128151938_f8a1dac0-1470-4be1-ad60-b735e595ceed
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0003, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0003/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-28 15:20:04,525 Stage-1 map = 0%, reduce = 0%
2017-01-28 15:20:23,381 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.92 sec
2017-01-28 15:20:44,225 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.04 sec
MapReduce Total cumulative CPU time: 7 seconds 40 msec
Ended Job = job_1485637695760_0003
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0004, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0004/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0004
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-01-28 15:21:13,819 Stage-2 map = 0%, reduce = 0%
2017-01-28 15:21:32,486 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.87 sec
2017-01-28 15:21:53,633 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 7.32 sec
MapReduce Total cumulative CPU time: 7 seconds 320 msec
Ended Job = job_1485637695760_0004
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.04 sec HDFS Read: 10325 HDFS Write: 138 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 7.32 sec HDFS Read: 5735 HDFS Write: 127 SUCCESS
Total MapReduce CPU Time Spent: 14 seconds 360 msec
OK
EMP050 Yen R. Wilder 881612
Time taken: 137.803 seconds, Fetched: 1 row(s)
Find the record of an Employee having second highest salary:-
hive> select empid,fullname,salary from (select * from employeetab sort by salary desc limit 2) result sort by salary limit 1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170128152414_70c1c1af-45f6-4e4e-9734-8605d27633f6
Total jobs = 4
Launching Job 1 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0005, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0005/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-28 15:24:38,730 Stage-1 map = 0%, reduce = 0%
2017-01-28 15:24:59,491 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.56 sec
2017-01-28 15:25:20,925 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.04 sec
MapReduce Total cumulative CPU time: 8 seconds 40 msec
Ended Job = job_1485637695760_0005
Launching Job 2 out of 4
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0006, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0006/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0006
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-01-28 15:26:03,344 Stage-2 map = 0%, reduce = 0%
2017-01-28 15:26:28,215 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 3.25 sec
2017-01-28 15:26:46,291 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 6.71 sec
MapReduce Total cumulative CPU time: 6 seconds 710 msec
Ended Job = job_1485637695760_0006
Launching Job 3 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0007, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0007/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0007
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2017-01-28 15:27:15,372 Stage-3 map = 0%, reduce = 0%
2017-01-28 15:27:31,528 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.6 sec
2017-01-28 15:27:50,807 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 6.52 sec
MapReduce Total cumulative CPU time: 6 seconds 520 msec
Ended Job = job_1485637695760_0007
Launching Job 4 out of 4
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0008, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0008/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0008
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 1
2017-01-28 15:28:22,403 Stage-4 map = 0%, reduce = 0%
2017-01-28 15:28:42,265 Stage-4 map = 100%, reduce = 0%, Cumulative CPU 3.2 sec
2017-01-28 15:29:03,271 Stage-4 map = 100%, reduce = 67%, Cumulative CPU 6.93 sec
2017-01-28 15:29:05,929 Stage-4 map = 100%, reduce = 100%, Cumulative CPU 8.1 sec
MapReduce Total cumulative CPU time: 8 seconds 100 msec
Ended Job = job_1485637695760_0008
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.56 sec HDFS Read: 10485 HDFS Write: 182 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 6.71 sec HDFS Read: 5165 HDFS Write: 182 SUCCESS
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 6.52 sec HDFS Read: 4867 HDFS Write: 140 SUCCESS
Stage-Stage-4: Map: 1 Reduce: 1 Cumulative CPU: 8.1 sec HDFS Read: 5742 HDFS Write: 129 SUCCESS
Total MapReduce CPU Time Spent: 29 seconds 890 msec
OK
EMP043 Dean B. Tillman 807383
Time taken: 292.489 seconds, Fetched: 1 row(s)
Find the record of an Employee having third highest salary:-
hive> select * from (
> select * from
> employeetab sort by salary desc limit 3)
> result sort by salary limit 1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170128173017_c49e7f8b-a3ef-465e-807e-305588cc2b00
Total jobs = 4
Launching Job 1 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0013, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0013/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0013
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-28 17:30:48,747 Stage-1 map = 0%, reduce = 0%
2017-01-28 17:31:12,061 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.89 sec
2017-01-28 17:31:34,211 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.22 sec
MapReduce Total cumulative CPU time: 8 seconds 220 msec
Ended Job = job_1485637695760_0013
Launching Job 2 out of 4
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0014, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0014/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0014
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-01-28 17:32:07,166 Stage-2 map = 0%, reduce = 0%
2017-01-28 17:32:28,781 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 3.26 sec
2017-01-28 17:32:53,237 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 7.57 sec
MapReduce Total cumulative CPU time: 7 seconds 570 msec
Ended Job = job_1485637695760_0014
Launching Job 3 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0015, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0015/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0015
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2017-01-28 17:33:25,838 Stage-3 map = 0%, reduce = 0%
2017-01-28 17:33:50,358 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.5 sec
2017-01-28 17:34:11,664 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 9.14 sec
MapReduce Total cumulative CPU time: 9 seconds 140 msec
Ended Job = job_1485637695760_0015
Launching Job 4 out of 4
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1485637695760_0016, Tracking URL = http://localhost:8088/proxy/application_1485637695760_0016/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1485637695760_0016
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 1
2017-01-28 17:34:43,884 Stage-4 map = 0%, reduce = 0%
2017-01-28 17:35:03,231 Stage-4 map = 100%, reduce = 0%
2017-01-28 17:35:25,687 Stage-4 map = 100%, reduce = 67%, Cumulative CPU 6.75 sec
2017-01-28 17:35:28,248 Stage-4 map = 100%, reduce = 100%, Cumulative CPU 8.29 sec
MapReduce Total cumulative CPU time: 8 seconds 290 msec
Ended Job = job_1485637695760_0016
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.22 sec HDFS Read: 11177 HDFS Write: 348 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 7.57 sec HDFS Read: 5934 HDFS Write: 348 SUCCESS
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 9.14 sec HDFS Read: 5986 HDFS Write: 184 SUCCESS
Stage-Stage-4: Map: 1 Reduce: 1 Cumulative CPU: 8.29 sec HDFS Read: 6736 HDFS Write: 173 SUCCESS
Total MapReduce CPU Time Spent: 33 seconds 220 msec
OK
EMP041 Salvador I. Raymond 671-72-4788 Customer Support 27-09-2011 579621
Time taken: 313.268 seconds, Fetched: 1 row(s)
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
It is really a great work and the way in which you are sharing the knowledge is excellent.
ReplyDeletebig data company in chennai
It's really awesome blog.
ReplyDeleteKeep posting more blogs.
Thank you.
big data and hadoop course
big data training