Hive Basics
Show existing databases:-
hive> show databases;
OK
default
mydb
Time taken: 0.224 seconds, Fetched: 2 row(s)
Show tables in existing database:-
hive> show tables in default;
OK
retailtab
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> show tables in mydb;
OK
jointab1
jointab2
nycauseofdeath
nydeath
retailtab
samptab
statictab
wikitab
Time taken: 0.13 seconds, Fetched: 8 row(s)
Create database:-
hive> create database hivedb;
OK
Time taken: 0.975 seconds
To print the current database:-
hive> set hive.cli.print.current.db=true;
hive (mydb)>
Change the database:-
hive (mydb)> use hivedb;
OK
Time taken: 0.088 seconds
hive (hivedb)>
Create database with DBPROPERTIES:-
hive (hivedb)> create database hivedatabase
> with DBPROPERTIES('Creator'='Snehal Thakur','Date'='22-01-2017');
OK
Time taken: 0.923 seconds
hive (hivedb)> describe database hivedatabase;
OK
hivedatabase hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db hduser USER
Time taken: 0.187 seconds, Fetched: 1 row(s)
hive (hivedb)> describe database extended hivedatabase;
OK
hivedatabase hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db hduser USER {Date=22-01-2017, Creator=Snehal Thakur}
Time taken: 0.072 seconds, Fetched: 1 row(s)
Alter database:-
hive (hivedb)> describe database extended hivedatabase;
OK
hivedatabase hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db hduser USER {Date=22-01-2017, Creator=Snehal Thakur}
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive (hivedb)> alter database hivedatabase set dbproperties('topic'='Hive');
OK
Time taken: 1.118 seconds
hive (hivedb)> describe database extended hivedatabase;
OK
hivedatabase hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db hduser USER {topic=Hive, Creator=Snehal Thakur, Date=22-01-2017}
Time taken: 0.093 seconds, Fetched: 1 row(s)
Drop database:-
hive (hivedb)> show databases;
OK
default
hivedatabase
hivedb
mydb
Time taken: 0.06 seconds, Fetched: 4 row(s)
hive (hivedb)> drop database hivedatabase;
OK
Time taken: 5.136 seconds
hive (hivedb)> show databases;
OK
default
hivedb
mydb
Time taken: 0.063 seconds, Fetched: 3 row(s)
Managed tables in Hive:-
When we create a table in Hive, by default Hive will manage the data, which means Hive moves the data into its warehouse directory. These are the default (internal) tables of Hive.
Default warehouse location is "hdfs://user/hive/warehouse/managed_table".
Syntax:-
create table <tablename>(col1 datatype,col2 datatype.........)
row format delimited
fields terminated by 'delimiter character'
stored as <fileformat>;
Create Managed table:-
hive (hivedb)> create table ny_cause_of_death(year int,leading_cause string,sex string,race_ethnicity string,death string,deathrate double,ageadjusted_deathrate double)
> row format delimited
> fields terminated by ','
> stored as textfile
> TBLPROPERTIES("skip.header.line.count"="1");
OK
Time taken: 1.644 seconds
The create table statement follows SQL conventions.The row format clause specifies each row is terminated by the newline character.
The fields terminated by clause specifies that the fields associated with the table are to be delimited by a comma ",".
The stored as clause specifies that the table will be stored in textfile format.
The TBLPROPERTIES can be used to express essential metadata about the database connection.
To see the detailed table information:-
hive (hivedb)> describe formatted ny_cause_of_death;
OK
# col_name data_type comment
year int
leading_cause string
sex string
race_ethnicity string
death string
deathrate double
ageadjusted_deathrate double
# Detailed Table Information
Database: hivedb
Owner: hduser
CreateTime: Sat Jan 21 20:04:49 PST 2017
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://localhost:9000/user/hive/warehouse/hivedb.db/ny_cause_of_death
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
numFiles 0
numRows 0
rawDataSize 0
skip.header.line.count 1
totalSize 0
transient_lastDdlTime 1485057889
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.203 seconds, Fetched: 38 row(s)
download input data file New_York_City_Leading_Causes_of_Death.csv
To load the data in Managed tables:-
There are two ways to load the data
1. Local Mode
2. HDFS Mode
1. Local Mode:-
In this mode, Hive will accept the input data from any of the LFS (local file system) path. Once the data get loaded, it will be a part of hive warehouse hdfs path.
Syntax: load data local inpath '<localpath>' into table <tablename>
hive (hivedb)> describe ny_cause_of_death;
OK
year int
leading_cause string
sex string
race_ethnicity string
death string
deathrate double
ageadjusted_deathrate double
Time taken: 1.546 seconds, Fetched: 7 row(s)
hive (hivedb)> load data local inpath '/home/hduser/HIVE/NYLeadingCausesOfDeath.csv' into table ny_cause_of_death;
Loading data to table hivedb.ny_cause_of_death
OK
Time taken: 3.629 seconds
hive (hivedb)> select * from ny_cause_of_death limit 20;
OK
2014 Diabetes Mellitus (E10-E14) F Other Race/ Ethnicity 11 NULL NULL
2011 Cerebrovascular Disease (Stroke: I60-I69) M White Non-Hispanic 290 21.7 18.2
2008 Malignant Neoplasms (Cancer: C00-C97) M Not Stated/Unknown 60 NULL NULL
2010 Malignant Neoplasms (Cancer: C00-C97) F Hispanic 1045 85.9 98.5
2012 Cerebrovascular Disease (Stroke: I60-I69) M Black Non-Hispanic 170 19.9 23.3
2007 Mental and Behavioral Disorders due to Use of Alcohol (F10) M Not Stated/Unknown . NULL NULL
2011 All Other Causes F Not Stated/Unknown 14 NULL NULL
2007 Chronic Lower Respiratory Diseases (J40-J47) F Black Non-Hispanic 163 15.5 14.8
2012 "Essential Hypertension and Renal Diseases (I10 I12)" F Hispanic 101.0 8.2
2009 "Diseases of Heart (I00-I09 I11 I13 I20-I51)" NULL NULL
2011 Chronic Lower Respiratory Diseases (J40-J47) M Other Race/ Ethnicity . NULL NULL
2008 Cerebrovascular Disease (Stroke: I60-I69) F Not Stated/Unknown 13 NULL NULL
2013 "Diseases of Heart (I00-I09 I11 I13 I20-I51)" NULL NULL
2010 Cerebrovascular Disease (Stroke: I60-I69) F Not Stated/Unknown . NULL NULL
2012 "Diseases of Heart (I00-I09 I11 I13 I20-I51)" NULL NULL
2012 Malignant Neoplasms (Cancer: C00-C97) M Black Non-Hispanic 1563 182.7 215.3
2011 Cerebrovascular Disease (Stroke: I60-I69) F Asian and Pacific Islander 98 17.2 20.6
2014 "Congenital Malformations Deformations and Chromosomal Abnormalities (Q00-Q99)" F NULL NULL
2009 All Other Causes F Asian and Pacific Islander 220 40.1 48.9
2014 Diabetes Mellitus (E10-E14) M Asian and Pacific Islander 71 12.4 14.2
Time taken: 2.586 seconds, Fetched: 20 row(s)
In Local Mode, file from LFS(local file system) will get copied into Hive warehouse path.
In HDFS Mode, file from HFDS directory will get moved into Hive warehouse path.
2. HDFS Mode:-
In this mode, Hive will accept the input data from HDFS path. Once the data get loaded, it will be a part of Hive warehouse.
Syntax: load data inpath '<localpath>' into table <tablename>
hduser@localhost HIVE]$ hdfs dfs -put NYLeadingCausesOfDeath.csv /sampDir
17/01/23 01:44:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hduser@localhost HIVE]$ hdfs dfs -ls /sampDir
17/01/23 01:45:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 5 hduser supergroup 340 2017-01-18 04:41 /sampDir/LogFiles.log
-rw-r--r-- 1 hduser supergroup 92389 2017-01-23 01:45 /sampDir/NYLeadingCausesOfDeath.csv
We can also copy the schema (but not the data) of an existing table:
CREATE TABLE <NewTableName> LIKE <ExistingTableName>;
hive (hivedb)> show tables;
OK
ny_cause_of_death
Time taken: 0.198 seconds, Fetched: 1 row(s)
hive (hivedb)> create table ny_cause_of_death_new like ny_cause_of_death;
OK
Time taken: 1.066 seconds
hive (hivedb)> show tables;
OK
ny_cause_of_death
ny_cause_of_death_new
Time taken: 0.099 seconds, Fetched: 2 row(s)
hive (hivedb)> desc ny_cause_of_death_new;
OK
year int
leading_cause string
sex string
race_ethnicity string
death string
deathrate double
ageadjusted_deathrate double
Time taken: 0.26 seconds, Fetched: 7 row(s)
hive (hivedb)> select * from ny_cause_of_death_new;
OK
Time taken: 0.423 seconds
hive (hivedb)> load data inpath '/sampDir/NYLeadingCausesOfDeath.csv' into table ny_cause_of_death_new;
Loading data to table hivedb.ny_cause_of_death_new
OK
Time taken: 2.226 seconds
hive (hivedb)> select * from ny_cause_of_death_new limit 10;
OK
NULL Leading Cause Sex Race Ethnicity Deaths NULL NULL
2014 Diabetes Mellitus (E10-E14) F Other Race/ Ethnicity 11 NULL NULL
2011 Cerebrovascular Disease (Stroke: I60-I69) M White Non-Hispanic 290 21.7 18.2
2008 Malignant Neoplasms (Cancer: C00-C97) M Not Stated/Unknown 60 NULL NULL
2010 Malignant Neoplasms (Cancer: C00-C97) F Hispanic 1045 85.9 98.5
2012 Cerebrovascular Disease (Stroke: I60-I69) M Black Non-Hispanic 170 19.9 23.3
2007 Mental and Behavioral Disorders due to Use of Alcohol (F10) M Not Stated/Unknown . NULL NULL
2011 All Other Causes F Not Stated/Unknown 14 NULL NULL
2007 Chronic Lower Respiratory Diseases (J40-J47) F Black Non-Hispanic 163 15.5 14.8
2012 "Essential Hypertension and Renal Diseases (I10 I12)" F Hispanic 101.0 8.2
Time taken: 0.335 seconds, Fetched: 10 row(s)
In HDFS mode, file "NYLeadingCausesOfDeath.csv" is not present in "/sampDir", it is moved to Hive warehouse path
i.e "/user/hive/warehouse/hivedb"
[hduser@localhost HIVE]$ hdfs dfs -ls /sampDir
17/01/23 02:05:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 5 hduser supergroup 340 2017-01-18 04:41 /sampDir/LogFiles.log
Dropping the Managed table:-
hive (hivedb)> show tables;
OK
ny_cause_of_death
ny_cause_of_death_new
reuse_extrnl_tab
Time taken: 0.201 seconds, Fetched: 3 row(s)
hive (hivedb)> drop table ny_cause_of_death;
OK
Time taken: 0.699 seconds
hive (hivedb)> show tables;
OK
ny_cause_of_death_new
reuse_extrnl_tab
Time taken: 0.084 seconds, Fetched: 2 row(s)
hive (hivedb)> select * from ny_cause_of_death;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'ny_cause_of_death'
thakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Hive Database And Managed Table >>>>> Download Now
Delete>>>>> Download Full
Hive Database And Managed Table >>>>> Download LINK
>>>>> Download Now
Hive Database And Managed Table >>>>> Download Full
>>>>> Download LINK x5
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.
ReplyDeleteLooking for Big Data Hadoop Training Institute in Bangalore, India. Prwatech is the best one to offers computer training courses including IT software course in Bangalore, India.
Also it provides placement assistance service in Bangalore for IT. big data certification courses in bangalore.
Thank you.Very nice blog with very useful content.
ReplyDeleteKeep posting more Blogs.
big data hadoop certification
best big data course online
learn big data online
big data hadoop course
Hive Database And Managed Table >>>>> Download Now
ReplyDelete>>>>> Download Full
Hive Database And Managed Table >>>>> Download LINK
>>>>> Download Now
Hive Database And Managed Table >>>>> Download Full
>>>>> Download LINK Vp