Hive: External Tables
External Tables:-
As the name implies, these tables are external to the Hive warehouse location. i.e. In external tables, data will not be stored generally on the Hive warehouse location instead external tables will store the data in a location which we are specifying in schema creation (definition) time by using "EXTERNAL" keyword.
When we drop an external table, Hive will leave the data untouched and Hive will only delete the metadata.
Syntax:-
create external table <tablename>(col1 datatype,col2 datatype.........)
row format delimited
fields terminated by 'delimiter character'
stored as <fileformat>
location 'HDFS path';
hive (hivedb)> create external table external_tab(year int,cause string,sex string,race_ethnicity string,death string,deathrate double,ageadjusted_deathrate double)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> stored as textfile
> location '/externaltab';
OK
Time taken: 2.033 seconds
hive (hivedb)> desc formatted external_tab;
OK
# col_name data_type comment
year int
cause string
sex string
race_ethnicity string
death string
deathrate double
ageadjusted_deathrate double
# Detailed Table Information
Database: hivedb
Owner: hduser
CreateTime: Tue Jan 24 01:02:02 PST 2017
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://localhost:9000/externaltab
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
numFiles 1
totalSize 92389
transient_lastDdlTime 1485248583
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
line.delim \n
serialization.format ,
Time taken: 3.05 seconds, Fetched: 36 row(s)
hive (hivedb)> load data local inpath '/home/hduser/HIVE/NYLeadingCausesOfDeath.csv' into table external_tab;
Loading data to table hivedb.external_tab
OK
Time taken: 4.756 seconds
hive (hivedb)> select year,cause,sex from external_tab limit 10;
OK
NULL Leading Cause Sex
2014 Diabetes Mellitus (E10-E14) F
2011 Cerebrovascular Disease (Stroke: I60-I69) M
2008 Malignant Neoplasms (Cancer: C00-C97) M
2010 Malignant Neoplasms (Cancer: C00-C97) F
2012 Cerebrovascular Disease (Stroke: I60-I69) M
2007 Mental and Behavioral Disorders due to Use of Alcohol (F10) M
2011 All Other Causes F
2007 Chronic Lower Respiratory Diseases (J40-J47) F
2012 "Essential Hypertension and Renal Diseases (I10 I12)"
Time taken: 4.607 seconds, Fetched: 10 row(s)
Loading the data in the External table is same as loading the data in Managed table.
Dropping the External table:-
hive (hivedb)> drop table external_tab;
OK
Time taken: 10.196 seconds
hive (hivedb)> select * from external_tab;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'external_tab'
hive (hivedb)> show tables;
OK
ny_cause_of_death
ny_cause_of_death_new
Time taken: 0.114 seconds, Fetched: 2 row(s)
After dropping the external table, we can see the actual data "NYLeadingCausesOfDeath.csv" is still present in "/externaltab" directory.We can reuse the same data which is stored in '/externaltab' directory. But in this case, no need to load the data again.
hive (hivedb)> create external table reuse_extrnl_tab(year int,cause string,sex string,race_ethnicity string,death string,deathrate double,ageadjusted_deathrate double)
> row format delimited
> fields terminated by ','
> stored as textfile
> location '/externaltab'
> TBLPROPERTIES ("skip.header.line.count"="1");
OK
Time taken: 0.573 seconds
hive (hivedb)> select * from reuse_extrnl_tab limit 10;
OK
2014 Diabetes Mellitus (E10-E14) F Other Race/ Ethnicity 11 NULL NULL
2011 Cerebrovascular Disease (Stroke: I60-I69) M White Non-Hispanic 290 21.7 18.2
2008 Malignant Neoplasms (Cancer: C00-C97) M Not Stated/Unknown 60 NULL NULL
2010 Malignant Neoplasms (Cancer: C00-C97) F Hispanic 1045 85.9 98.5
2012 Cerebrovascular Disease (Stroke: I60-I69) M Black Non-Hispanic 170 19.9 23.3
2007 Mental and Behavioral Disorders due to Use of Alcohol (F10) M Not Stated/Unknown . NULL NULL
2011 All Other Causes F Not Stated/Unknown 14 NULL NULL
2007 Chronic Lower Respiratory Diseases (J40-J47) F Black Non-Hispanic 163 15.5 14.8
2012 "Essential Hypertension and Renal Diseases (I10 I12)" F Hispanic 101.0 8.2
2009 "Diseases of Heart (I00-I09 I11 I13 I20-I51)" NULL NULL
Time taken: 0.543 seconds, Fetched: 10 row(s)
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Hive External Table >>>>> Download Now
Delete>>>>> Download Full
Hive External Table >>>>> Download LINK
>>>>> Download Now
Hive External Table >>>>> Download Full
>>>>> Download LINK ym
Thank you for sharing your knowledge with us.
ReplyDeleteKeep updating.
big data hadoop certification
big data online course
Hive External Table >>>>> Download Now
ReplyDelete>>>>> Download Full
Hive External Table >>>>> Download LINK
>>>>> Download Now
Hive External Table >>>>> Download Full
>>>>> Download LINK gg