Hive Partitioning
Partition is horizontally dividing the data into number of slice in equal and manageable manner.
Every partition is stored as directory within data warehouse table.
In data warehouse, this partition concept is common, but there are two types of Partitions are available in data warehouse concepts.
Every partition is stored as directory within data warehouse table.
In data warehouse, this partition concept is common, but there are two types of Partitions are available in data warehouse concepts.
1. SQL Partition
2. Hive Partition
1. SQL Partition:-
The main work of SQL Partition is dividing the large amount of data into number of slices by based on table single column only.
SQL Partition is very hard comparing to Hive Partition because SQL server only support one column for Partition.
2. Hive Partition:-
The main work of Hive Partition is also same as SQL Partition, but the main difference between SQL Partition and Hive Partition is SQL Partition only supports single column in table. While in Hive Partition, it supports multiple columns in a table.
Using Partitions can make it faster to do queries on slices of the data.
In Hive Partition concept there is two different type of partitions:-
i. Static Partition
ii. Dynamic Partition
i. Static Partition
In Static Partition, we statically add a partition in table and move the file into the partition of the table. We need to specify the partition column value in each and every LOAD statement.
Creation of Partitioned Table:-
Syntax:-
CREATE TABLE TableName(ColName1 DataType1, ColName2 DataType2)
PARTITIONED BY (PartitionColumn1 DataType2, PartitionColumn2 DataType2)
Creation of Partitioned Table:-
Syntax:-
CREATE TABLE TableName(ColName1 DataType1, ColName2 DataType2)
PARTITIONED BY (PartitionColumn1 DataType2, PartitionColumn2 DataType2)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 'delimiter character'
STORED AS <fileformat>;
hive (hivedb)> create table statictable (id int,name string,salary int)
> partitioned by (city string)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
OK
Time taken: 23.871 seconds
In description, we can see the Partitioned column "city"
hive (hivedb)> describe statictable;
OK
id int
name string
salary int
city string
# Partition Information
# col_name data_type comment
city string
Time taken: 1.498 seconds, Fetched: 9 row(s)
Loading the data in the table:-
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data1.log' into table statictable PARTITION(city= 'Pune');
Loading data to table hivedb.statictable partition (city=Pune)
OK
Time taken: 15.667 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table statictable PARTITION(city= 'Mumbai');
Loading data to table hivedb.statictable partition (city=Mumbai)
OK
Time taken: 2.727 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data3.log' into table statictable PARTITION(city= 'Nagpur');
Loading data to table hivedb.statictable partition (city=Nagpur)
OK
Time taken: 1.76 seconds
To view the created Partitions:-
hive (hivedb)> show partitions statictable;
OK
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.939 seconds, Fetched: 3 row(s)
To view the content of the table:-
hive (hivedb)> select * from statictable;
OK
200 Sonu 5000 Mumbai
201 Nik 3000 Mumbai
202 Sam 4000 Mumbai
203 Chotu 4000 Mumbai
.
.
.
224 Suruchi 42500 Pune
225 Vaibhav 28500 Pune
226 Kaustubh 57500 Pune
227 Jitesh 33500 Pune
228 Mayur 44500 Pune
229 Mohit 41500 Pune
Time taken: 0.677 seconds, Fetched: 80 row(s)
Adding a Partition:-
Syntax:- ALTER TABLE TableName ADD PARTITION PartitionName;
hive (hivedb)> alter table statictable ADD PARTITION (city='Chennai');
OK
Time taken: 2.483 seconds
hive (hivedb)> show partitions statictable;
OK
city=Chennai
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.561 seconds, Fetched: 4 row(s)
New Partition is added "city=Chennai"
Renaming a Partition:-
Syntax:- ALTER TABLE PARTITION ExistingPartition RENAME TO PARTITION NewPartition
hive (hivedb)> alter table statictable PARTITION (city='Chennai') RENAME TO PARTITION (city='Indore');
OK
Time taken: 2.964 seconds
hive (hivedb)> show partitions statictable;
OK
city=Indore
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.549 seconds, Fetched: 4 row(s)
Partition is renamed from Chennai to IndoreDropping a Partition:-
Syntax:- ALTER TABLE TableName DROP PARTITION PartitionName;
hive (hivedb)> alter table statictable drop PARTITION (city='Indore');
Dropped the partition city=Indore
OK
Time taken: 5.042 seconds
hive (hivedb)> show partitions statictable;
OK
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.463 seconds, Fetched: 3 row(s)
Creation of Partition on multiple columns:-
hive (hivedb)> create table multiplepartitiontable (id int,name string,salary int)
> partitioned by (city string,dated string)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
OK
Time taken: 0.637 seconds
In description, we can see the Partitioned column "city and dated" columns
hive (hivedb)> describe multiplepartitiontable;
OK
id int
name string
salary int
city string
dated string
# Partition Information
# col_name data_type comment
city string
dated string
Time taken: 0.137 seconds, Fetched: 11 row(s)
Loading the data in the table:-
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data1.log' into table multiplepartitiontable PARTITION(city= 'Pune',dated='01/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Pune, dated=01/02/2017)
OK
Time taken: 1.88 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table multiplepartitiontable PARTITION(city= 'Pune',dated='02/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Pune, dated=02/02/2017)
OK
Time taken: 1.478 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data3.log' into table multiplepartitiontable PARTITION(city= 'Nagpur',dated='03/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Nagpur, dated=03/02/2017)
OK
Time taken: 1.394 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table multiplepartitiontable PARTITION(city= 'Mumbai',dated='03/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Mumbai, dated=03/02/2017)
OK
Time taken: 1.466 seconds
To see the created Partitions:-
hive (hivedb)> show partitions multiplepartitiontable;
OK
city=Mumbai/dated=03%2F02%2F2017
city=Nagpur/dated=03%2F02%2F2017
city=Pune/dated=01%2F02%2F2017
city=Pune/dated=02%2F02%2F2017
Time taken: 0.198 seconds, Fetched: 4 row(s)
To view the content of the table:-
hive (hivedb)> select * from multiplepartitiontable;
OK
200 Sonu 5000 Mumbai 03/02/2017
201 Nik 3000 Mumbai 03/02/2017
202 Sam 4000 Mumbai 03/02/2017
203 Chotu 4000 Mumbai 03/02/2017
204 Aju 5000 Mumbai 03/02/2017
.
.
.
.
304 Chiku 1050 Pune 02/02/2017
305 Monu 3550 Pune 02/02/2017
306 Chutki 5550 Pune 02/02/2017
307 Tunu 3550 Pune 02/02/2017
308 Laddu 4350 Pune 02/02/2017
309 Tinku 4550 Pune 02/02/2017
310 Piku 2450 Pune 02/02/2017
Time taken: 4.57 seconds, Fetched: 101 row(s)
For the same example can you please explain dynamic partition
ReplyDeleteReally Good blog post.provided a helpful information. Big data hadoop online training Bangalore
ReplyDeletethakyou it vry nice blog for beginners
ReplyDeletehttps://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/
ReplyDeleteNice blog..! I really loved reading through this article. Thanks for sharing such a amazing post with us and keep blogging...
Hadoop training in Hyderabad
Hadoop online training in Hyderabad
Bigdata Hadoop online training in Hyderabad
It is really a great work and the way in which you are sharing the knowledge is excellent.
ReplyDeletebig data company in chennai
Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
ReplyDeleteEven we r on same page. could u just give a review.
Hadoop online training
Best Hadoop online training
Hadoop online training in Hyderabad
Hadoop online training in india
You are doing a great job by sharing useful information about Hadoop course. It is one of the post to read and improve my knowledge in Hadoop.You can check our Hadoop Hive Partition example,for more information about Hadoop static partition in hive.
ReplyDeletevery nice post.
ReplyDeleteThank you for posting such an amazing blog.
big data Hadoop course
This comment has been removed by the author.
ReplyDelete