Hive database and Managed table

Hive Basics

Show existing databases:-

hive> show databases;
OK
default
mydb
Time taken: 0.224 seconds, Fetched: 2 row(s)

Show tables in existing database:-

hive> show tables in default;
OK
retailtab
Time taken: 0.111 seconds, Fetched: 1 row(s)

hive> show tables in mydb;
OK
jointab1
jointab2
nycauseofdeath
nydeath
retailtab
samptab
statictab
wikitab
Time taken: 0.13 seconds, Fetched: 8 row(s)

Create database:-

hive> create database hivedb;
OK
Time taken: 0.975 seconds

To print the current database:-

hive> set hive.cli.print.current.db=true;
hive (mydb)>

Change the database:-

hive (mydb)> use hivedb;
OK
Time taken: 0.088 seconds
hive (hivedb)>

Create database with DBPROPERTIES:-

hive (hivedb)> create database hivedatabase
             > with DBPROPERTIES('Creator'='Snehal Thakur','Date'='22-01-2017');
OK
Time taken: 0.923 seconds

hive (hivedb)> describe database hivedatabase;
OK
hivedatabase                hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db        hduser        USER        
Time taken: 0.187 seconds, Fetched: 1 row(s)

hive (hivedb)> describe database extended hivedatabase;
OK
hivedatabase                hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db        hduser        USER        {Date=22-01-2017, Creator=Snehal Thakur}
Time taken: 0.072 seconds, Fetched: 1 row(s)

Alter database:-

hive (hivedb)> describe database extended hivedatabase;
OK
hivedatabase                hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db        hduser        USER        {Date=22-01-2017, Creator=Snehal Thakur}
Time taken: 0.072 seconds, Fetched: 1 row(s)

hive (hivedb)> alter database hivedatabase set dbproperties('topic'='Hive');
OK
Time taken: 1.118 seconds

hive (hivedb)> describe database extended hivedatabase;
OK
hivedatabase                hdfs://localhost:9000/user/hive/warehouse/hivedatabase.db        hduser        USER        {topic=Hive, Creator=Snehal Thakur, Date=22-01-2017}
Time taken: 0.093 seconds, Fetched: 1 row(s)

Drop database:-

hive (hivedb)> show databases;
OK
default
hivedatabase
hivedb
mydb
Time taken: 0.06 seconds, Fetched: 4 row(s)

hive (hivedb)> drop database hivedatabase;
OK
Time taken: 5.136 seconds

hive (hivedb)> show databases;
OK
default
hivedb
mydb
Time taken: 0.063 seconds, Fetched: 3 row(s)

Managed tables in Hive:-
When we create a table in Hive, by default Hive will manage the data, which means Hive moves the data into its warehouse directory. These are the default (internal) tables of Hive.
Default warehouse location is "hdfs://user/hive/warehouse/managed_table".

Syntax:-

create table <tablename>(col1 datatype,col2 datatype.........)

row format delimited

fields terminated by 'delimiter character'

stored as <fileformat>;

Create Managed table:-

hive (hivedb)> create table ny_cause_of_death(year int,leading_cause string,sex string,race_ethnicity string,death string,deathrate double,ageadjusted_deathrate double)
             > row format delimited
             > fields terminated by ','
             > stored as textfile
             > TBLPROPERTIES("skip.header.line.count"="1");
OK
Time taken: 1.644 seconds

The create table statement follows SQL conventions.
The row format clause specifies each row is terminated by the newline character.
The fields terminated by clause specifies that the fields associated with the table are to be delimited by a comma ",".
The stored as clause specifies that the table will be stored in textfile format.
The TBLPROPERTIES can be used to express essential metadata about the database connection.

To see the detailed table information:-

hive (hivedb)> describe formatted ny_cause_of_death;
OK
# col_name                    data_type                   comment             
                  
year                        int                                             
leading_cause               string                                          
sex                         string                                          
race_ethnicity              string                                          
death                       string                                          
deathrate                   double                                          
ageadjusted_deathrate       double                                          
                  
# Detailed Table Information                  
Database:                   hivedb                       
Owner:                      hduser                       
CreateTime:                 Sat Jan 21 20:04:49 PST 2017         
LastAccessTime:             UNKNOWN                      
Retention:                  0                            
Location:                   hdfs://localhost:9000/user/hive/warehouse/hivedb.db/ny_cause_of_death         
Table Type:                 MANAGED_TABLE                
Table Parameters:                  
        COLUMN_STATS_ACCURATE        {\"BASIC_STATS\":\"true\"}
        numFiles                    0                   
        numRows                     0                   
        rawDataSize                 0                   
        skip.header.line.count        1                   
        totalSize                   0                   
        transient_lastDdlTime        1485057889          
                  
# Storage Information                  
SerDe Library:              org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe         
InputFormat:                org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:               org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat         
Compressed:                 No                           
Num Buckets:                -1                           
Bucket Columns:             []                           
Sort Columns:               []                           
Storage Desc Params:                  
        field.delim                 ,                   
        serialization.format        ,                   
Time taken: 0.203 seconds, Fetched: 38 row(s)

download input data file New_York_City_Leading_Causes_of_Death.csv

To load the data in Managed tables:-
There are two ways to load the data
1. Local Mode
2. HDFS Mode

1. Local Mode:-
In this mode, Hive will accept the input data from any of the LFS (local file system) path. Once the data get loaded, it will be a part of hive warehouse hdfs path.

Syntax: load data local inpath '<localpath>' into table <tablename>

hive (hivedb)> describe ny_cause_of_death;
OK
year                        int                                             
leading_cause               string                                          
sex                         string                                          
race_ethnicity              string                                          
death                       string                                          
deathrate                   double                                          
ageadjusted_deathrate        double                                          
Time taken: 1.546 seconds, Fetched: 7 row(s)

hive (hivedb)> load data local inpath '/home/hduser/HIVE/NYLeadingCausesOfDeath.csv' into table ny_cause_of_death;
Loading data to table hivedb.ny_cause_of_death
OK
Time taken: 3.629 seconds

hive (hivedb)> select * from  ny_cause_of_death limit 20;
OK
2014        Diabetes Mellitus (E10-E14)        F        Other Race/ Ethnicity        11        NULL        NULL
2011        Cerebrovascular Disease (Stroke: I60-I69)        M        White Non-Hispanic        290        21.7        18.2
2008        Malignant Neoplasms (Cancer: C00-C97)        M        Not Stated/Unknown        60        NULL        NULL
2010        Malignant Neoplasms (Cancer: C00-C97)        F        Hispanic        1045        85.9        98.5
2012        Cerebrovascular Disease (Stroke: I60-I69)        M        Black Non-Hispanic        170        19.9        23.3
2007        Mental and Behavioral Disorders due to Use of Alcohol (F10)        M        Not Stated/Unknown        .        NULL        NULL
2011        All Other Causes        F        Not Stated/Unknown        14        NULL        NULL
2007        Chronic Lower Respiratory Diseases (J40-J47)        F        Black Non-Hispanic        163        15.5        14.8
2012        "Essential Hypertension and Renal Diseases (I10         I12)"        F        Hispanic        101.0        8.2
2009        "Diseases of Heart (I00-I09         I11         I13         I20-I51)"        NULL        NULL
2011        Chronic Lower Respiratory Diseases (J40-J47)        M        Other Race/ Ethnicity        .        NULL        NULL
2008        Cerebrovascular Disease (Stroke: I60-I69)        F        Not Stated/Unknown        13        NULL        NULL
2013        "Diseases of Heart (I00-I09         I11         I13         I20-I51)"        NULL        NULL
2010        Cerebrovascular Disease (Stroke: I60-I69)        F        Not Stated/Unknown        .        NULL        NULL
2012        "Diseases of Heart (I00-I09         I11         I13         I20-I51)"        NULL        NULL
2012        Malignant Neoplasms (Cancer: C00-C97)        M        Black Non-Hispanic        1563        182.7        215.3
2011        Cerebrovascular Disease (Stroke: I60-I69)        F        Asian and Pacific Islander        98        17.2        20.6
2014        "Congenital Malformations         Deformations         and Chromosomal Abnormalities (Q00-Q99)"        F        NULL        NULL
2009        All Other Causes        F        Asian and Pacific Islander        220        40.1        48.9
2014        Diabetes Mellitus (E10-E14)        M        Asian and Pacific Islander        71        12.4        14.2
Time taken: 2.586 seconds, Fetched: 20 row(s)

In Local Mode, file from LFS(local file system) will get copied into Hive warehouse path.
In HDFS Mode, file from HFDS directory will get moved into Hive warehouse path.

2. HDFS Mode:-
In this mode, Hive will accept the input data from HDFS path. Once the data get loaded, it will be a part of Hive warehouse.
Syntax: load data inpath '<localpath>' into table <tablename>

hduser@localhost HIVE]$ hdfs dfs -put NYLeadingCausesOfDeath.csv /sampDir
17/01/23 01:44:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hduser@localhost HIVE]$ hdfs dfs -ls /sampDir
17/01/23 01:45:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   5 hduser supergroup        340 2017-01-18 04:41 /sampDir/LogFiles.log
-rw-r--r--   1 hduser supergroup      92389 2017-01-23 01:45 /sampDir/NYLeadingCausesOfDeath.csv

We can also copy the schema (but not the data) of an existing table:
CREATE TABLE <NewTableName> LIKE <ExistingTableName>;

hive (hivedb)> show tables;
OK
ny_cause_of_death
Time taken: 0.198 seconds, Fetched: 1 row(s)

hive (hivedb)> create table ny_cause_of_death_new like ny_cause_of_death;
OK
Time taken: 1.066 seconds

hive (hivedb)> show tables;
OK
ny_cause_of_death
ny_cause_of_death_new
Time taken: 0.099 seconds, Fetched: 2 row(s)

hive (hivedb)> desc ny_cause_of_death_new;
OK
year                        int                                             
leading_cause               string                                          
sex                         string                                          
race_ethnicity              string                                          
death                       string                                          
deathrate                   double                                          
ageadjusted_deathrate        double                                          
Time taken: 0.26 seconds, Fetched: 7 row(s)

hive (hivedb)> select * from  ny_cause_of_death_new;
OK
Time taken: 0.423 seconds

hive (hivedb)> load data inpath '/sampDir/NYLeadingCausesOfDeath.csv' into table ny_cause_of_death_new;
Loading data to table hivedb.ny_cause_of_death_new
OK
Time taken: 2.226 seconds

hive (hivedb)> select * from  ny_cause_of_death_new limit 10;
OK
NULL Leading Cause Sex Race Ethnicity Deaths NULL NULL
2014 Diabetes Mellitus (E10-E14) F Other Race/ Ethnicity 11 NULL NULL
2011 Cerebrovascular Disease (Stroke: I60-I69) M White Non-Hispanic 290 21.7 18.2
2008 Malignant Neoplasms (Cancer: C00-C97) M Not Stated/Unknown 60 NULL NULL
2010 Malignant Neoplasms (Cancer: C00-C97) F Hispanic 1045 85.9 98.5
2012 Cerebrovascular Disease (Stroke: I60-I69) M Black Non-Hispanic 170 19.9 23.3
2007 Mental and Behavioral Disorders due to Use of Alcohol (F10) M Not Stated/Unknown . NULL NULL
2011 All Other Causes F Not Stated/Unknown 14 NULL NULL
2007 Chronic Lower Respiratory Diseases (J40-J47) F Black Non-Hispanic 163 15.5 14.8
2012 "Essential Hypertension and Renal Diseases (I10  I12)" F Hispanic 101.0 8.2
Time taken: 0.335 seconds, Fetched: 10 row(s)

In HDFS mode, file "NYLeadingCausesOfDeath.csv" is not present in "/sampDir", it is moved to Hive warehouse path
i.e "/user/hive/warehouse/hivedb"

[hduser@localhost HIVE]$ hdfs dfs -ls /sampDir
17/01/23 02:05:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   5 hduser supergroup        340 2017-01-18 04:41 /sampDir/LogFiles.log

Dropping the Managed table:-

hive (hivedb)> show tables;
OK
ny_cause_of_death
ny_cause_of_death_new
reuse_extrnl_tab
Time taken: 0.201 seconds, Fetched: 3 row(s)

hive (hivedb)> drop table ny_cause_of_death;
OK
Time taken: 0.699 seconds
hive (hivedb)> show tables;
OK
ny_cause_of_death_new
reuse_extrnl_tab
Time taken: 0.084 seconds, Fetched: 2 row(s)

hive (hivedb)> select * from ny_cause_of_death;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'ny_cause_of_death'

After dropping the Managed table, Metadata as well as actual data will be removed.

Comments

Unknown8 July 2018 at 05:09
thakyou it vry nice blog for beginners
https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
Prwatech10 March 2020 at 05:45
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.

Looking for Big Data Hadoop Training Institute in Bangalore, India. Prwatech is the best one to offers computer training courses including IT software course in Bangalore, India.

Also it provides placement assistance service in Bangalore for IT. big data certification courses in bangalore.
veera cynixit10 July 2020 at 02:13
Thank you.Very nice blog with very useful content.
Keep posting more Blogs.
big data hadoop certification
best big data course online
learn big data online
big data hadoop course

Anonymous19 April 2022 at 22:19
Hive Database And Managed Table >>>>> Download Now

>>>>> Download Full

Hive Database And Managed Table >>>>> Download LINK

>>>>> Download Now

Hive Database And Managed Table >>>>> Download Full

>>>>> Download LINK Vp

Snehal's Big Data Spark Hadoop Zone

Search This Blog