Hive Partitioning

Hive Partitioning

          Partition is horizontally dividing the data into number of slice in equal and manageable manner.
Every partition is stored as directory within data warehouse table.
          In data warehouse, this partition concept is common, but there are two types of Partitions are available in data warehouse concepts.
                       1. SQL Partition
                       2. Hive Partition


1. SQL Partition:-
            The main work of SQL Partition is dividing the large amount of data into number of slices by based on table single column only.
             SQL Partition is very hard comparing to Hive Partition because SQL server only support one column for Partition.
2. Hive Partition:-
             The main work of Hive Partition is also same as SQL Partition, but the main difference between SQL Partition and Hive Partition is SQL Partition only supports single column in table. While in Hive Partition, it supports multiple columns in a table.
Using Partitions can make it faster to do queries on slices of the data.
In Hive Partition concept there is two different type of partitions:-
        i.  Static Partition
        ii. Dynamic Partition
i. Static Partition
    In Static Partition, we statically add a partition in table and move the file into the partition of the table. We need to specify the partition column value in each and every LOAD statement.

Creation of Partitioned Table:-
            Syntax:-
                          CREATE TABLE TableName(ColName1 DataType1, ColName2 DataType2)
                          PARTITIONED BY (PartitionColumn1 DataType2, PartitionColumn2 DataType2)
                          ROW FORMAT DELIMITED
                          FIELDS TERMINATED BY 'delimiter character'
                          STORED AS <fileformat>;
hive (hivedb)> create table statictable (id int,name string,salary int)
             > partitioned by (city string)
             > row format delimited
             > fields terminated by '\t'
             > stored as textfile;
OK
Time taken: 23.871 seconds

In description, we can see the Partitioned column "city"
hive (hivedb)> describe statictable;
OK
id                      int                                         
name                    string                                      
salary                  int                                         
city                    string                                      
          
# Partition Information          
# col_name                data_type               comment             
          
city                    string                                      
Time taken: 1.498 seconds, Fetched: 9 row(s)

Loading the data in the table:- 
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data1.log' into table statictable PARTITION(city= 'Pune');
Loading data to table hivedb.statictable partition (city=Pune)
OK
Time taken: 15.667 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table statictable PARTITION(city= 'Mumbai');
Loading data to table hivedb.statictable partition (city=Mumbai)
OK
Time taken: 2.727 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data3.log' into table statictable PARTITION(city= 'Nagpur');
Loading data to table hivedb.statictable partition (city=Nagpur)
OK
Time taken: 1.76 seconds

To view the created Partitions:-
hive (hivedb)> show partitions statictable;
OK
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.939 seconds, Fetched: 3 row(s)

To view the content of the table:-
hive (hivedb)> select * from  statictable;
OK
200    Sonu    5000    Mumbai
201    Nik    3000    Mumbai
202    Sam    4000    Mumbai
203    Chotu    4000    Mumbai
.
.
.
224    Suruchi    42500    Pune
225    Vaibhav    28500    Pune
226    Kaustubh    57500    Pune
227    Jitesh    33500    Pune
228    Mayur    44500    Pune
229    Mohit    41500    Pune
Time taken: 0.677 seconds, Fetched: 80 row(s)

Adding a Partition:-
         Syntax:- ALTER TABLE TableName ADD PARTITION PartitionName;
hive (hivedb)> alter table statictable ADD PARTITION (city='Chennai');
OK
Time taken: 2.483 seconds
hive (hivedb)> show partitions statictable;
OK
city=Chennai
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.561 seconds, Fetched: 4 row(s)
New Partition is added "city=Chennai"

Renaming a Partition:-
  Syntax:- ALTER TABLE  PARTITION ExistingPartition RENAME TO PARTITION NewPartition
hive (hivedb)> alter table statictable PARTITION (city='Chennai') RENAME TO PARTITION (city='Indore');
OK
Time taken: 2.964 seconds
hive (hivedb)> show partitions statictable;
OK
city=Indore
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.549 seconds, Fetched: 4 row(s)
Partition is renamed from Chennai to Indore

Dropping a Partition:-
         Syntax:- ALTER TABLE TableName DROP PARTITION PartitionName;
hive (hivedb)> alter table statictable drop PARTITION (city='Indore');
Dropped the partition city=Indore
OK
Time taken: 5.042 seconds
hive (hivedb)> show partitions statictable;
OK
city=Mumbai
city=Nagpur
city=Pune
Time taken: 0.463 seconds, Fetched: 3 row(s)

Creation of Partition on multiple columns:-
hive (hivedb)> create table multiplepartitiontable (id int,name string,salary int)
             > partitioned by (city string,dated string)
             > row format delimited
             > fields terminated by '\t'
             > stored as textfile;
OK
Time taken: 0.637 seconds

In description, we can see the Partitioned column "city and dated" columns
hive (hivedb)> describe multiplepartitiontable;
OK
id                      int                                         
name                    string                                      
salary                  int                                         
city                    string                                      
dated                   string                                      
          
# Partition Information          
# col_name                data_type               comment             
          
city                    string                                      
dated                   string                                      
Time taken: 0.137 seconds, Fetched: 11 row(s)

Loading the data in the table:- 
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data1.log' into table multiplepartitiontable PARTITION(city= 'Pune',dated='01/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Pune, dated=01/02/2017)
OK
Time taken: 1.88 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table multiplepartitiontable PARTITION(city= 'Pune',dated='02/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Pune, dated=02/02/2017)
OK
Time taken: 1.478 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data3.log' into table multiplepartitiontable PARTITION(city= 'Nagpur',dated='03/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Nagpur, dated=03/02/2017)
OK
Time taken: 1.394 seconds
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table multiplepartitiontable PARTITION(city= 'Mumbai',dated='03/02/2017');
Loading data to table hivedb.multiplepartitiontable partition (city=Mumbai, dated=03/02/2017)
OK
Time taken: 1.466 seconds

To see the created Partitions:-
hive (hivedb)> show partitions multiplepartitiontable;
OK
city=Mumbai/dated=03%2F02%2F2017
city=Nagpur/dated=03%2F02%2F2017
city=Pune/dated=01%2F02%2F2017
city=Pune/dated=02%2F02%2F2017
Time taken: 0.198 seconds, Fetched: 4 row(s)

To view the content of the table:-
hive (hivedb)> select * from  multiplepartitiontable;
OK
200    Sonu    5000    Mumbai    03/02/2017
201    Nik    3000    Mumbai    03/02/2017
202    Sam    4000    Mumbai    03/02/2017
203    Chotu    4000    Mumbai    03/02/2017
204    Aju    5000    Mumbai    03/02/2017
.
.
.
.
304    Chiku    1050    Pune    02/02/2017
305    Monu    3550    Pune    02/02/2017
306    Chutki    5550    Pune    02/02/2017
307    Tunu    3550    Pune    02/02/2017
308    Laddu    4350    Pune    02/02/2017
309    Tinku    4550    Pune    02/02/2017
310    Piku    2450    Pune    02/02/2017
Time taken: 4.57 seconds, Fetched: 101 row(s)



Comments

  1. For the same example can you please explain dynamic partition

    ReplyDelete
  2. Really Good blog post.provided a helpful information. Big data hadoop online training Bangalore

    ReplyDelete
  3. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete


  4. Nice blog..! I really loved reading through this article. Thanks for sharing such a amazing post with us and keep blogging...

    Hadoop training in Hyderabad
    Hadoop online training in Hyderabad
    Bigdata Hadoop online training in Hyderabad

    ReplyDelete
  5. It is really a great work and the way in which you are sharing the knowledge is excellent.

    big data company in chennai

    ReplyDelete
  6. Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
    Even we r on same page. could u just give a review.
    Hadoop online training
    Best Hadoop online training
    Hadoop online training in Hyderabad
    Hadoop online training in india

    ReplyDelete
  7. You are doing a great job by sharing useful information about Hadoop course. It is one of the post to read and improve my knowledge in Hadoop.You can check our Hadoop Hive Partition example,for more information about Hadoop static partition in hive.

    ReplyDelete
  8. very nice post.
    Thank you for posting such an amazing blog.

    big data Hadoop course

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete

Post a Comment