Indexing in Hive

Indexing in Hive

What is Index?
  • An Index acts as a reference (pointer) to the records.
  • Instead of searching all the records, we can refer to the index to search for a record.
  • Indexes to search for a record with minimum overheads.
  • Indexes also speed up the searching of records.
Why use Indexing in Hive?
  • Hive is a data warehousing tool present on the top of Hadoop, which provides the SQL kind of interface to perform queries on large datasets.
  • Since Hive deals with Big Data, the size of files is naturally large and can span up to TeraBytes, PetaBytes or even more.
  • In a Hive table, there are many numbers of rows & columns. If we want to perform queries only on some columns without Indexing, it will take a large amount of time because queries will be executed on all the columns present in the table.
  • The major advantage of using indexing is whenever we perform a query on a table that has an index, there is no need for the query to scan all the rows in the table. Further, it checks the index first and then goes to the particular column and perform the operation.
Type of Indexes in Hive:- 
           1. Compact Indexing
           2. Bitmap Indexing

Compact Indexing stores the pair of indexed columns value & its block id.

Bitmap Indexing stores the combination of indexed column value & list of rows as a bitmap.

Difference Between Compact & Bitmap Indexing:-
      The main difference is the storing of mapped values of the rows in the different blocks.

To Create the Compact Index:-
         Syntax: - CREATE INDEX IndexName
                            ON TABLE TableName(Columns..)
                            AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
                            WITH DEFERRED REBUILD;
Here org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler specifies that a built in CompactIndexHandler will act on the created index, which means we are creating a Compact Index for the table and WITH DEFERRED REBUILD statement should be present in the created index to alter the index in later stages.
hive (hivedb)> create index nyCompactIndex on table ny_cause_of_death_new(year)
             > as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
             > with DEFERRED REBUILD;
OK
Time taken: 0.597 seconds

To View the Compact Index:-
           Syntax: - SHOW INDEX on IndexedTableName
hive (hivedb)> show index on ny_cause_of_death_new;
OK
nycompactindex          ny_cause_of_death_new    year                    hivedb__ny_cause_of_death_new_nycompactindex__    compact                 
Time taken: 0.152 seconds, Fetched: 1 row(s)
  To Drop the Compact Index:-
           Syntax: - DROP INDEX IndexName on IndexedTableName
hive (hivedb)> drop index nyCompactIndex on ny_cause_of_death_new;
OK
Time taken: 0.54 seconds
hive (hivedb)> show index on ny_cause_of_death_new;
OK
Time taken: 0.113 seconds

To Create the BITMAP Index:-
            Syntax: - CREATE INDEX IndexName
                            ON TABLE TableName(Columns..)
                            AS 'BITMAP'
                            WITH DEFERRED REBUILD;
Here, BITMAP defines the type of index as BITMAP.
hive (hivedb)> create index nyBitmapIndex on table ny_cause_of_death_new(year)
             > as 'BITMAP'
             > with DEFERRED REBUILD;
OK
Time taken: 0.599 seconds

To View the Bitmap Index:- 
           Syntax: - SHOW INDEX on IndexedTableName
hive (hivedb)> show index on ny_cause_of_death_new;
OK
nybitmapindex           ny_cause_of_death_new    year                    hivedb__ny_cause_of_death_new_nybitmapindex__    bitmap                  
Time taken: 0.105 seconds, Fetched: 1 row(s)

To Drop the Bitmap Index:-
           Syntax: - DROP INDEX IndexName on IndexedTableName
hive (hivedb)> drop index nyBitmapIndex on ny_cause_of_death_new;
OK
Time taken: 0.397 seconds
hive (hivedb)> show index on ny_cause_of_death_new;
OK
Time taken: 0.126 seconds

Comments

  1. Good Explanation...
    Really helpful for exploring additional information on HIVE..

    ReplyDelete
    Replies
    1. Indexing In Hive >>>>> Download Now

      >>>>> Download Full

      Indexing In Hive >>>>> Download LINK

      >>>>> Download Now

      Indexing In Hive >>>>> Download Full

      >>>>> Download LINK TI

      Delete
  2. Good Snehal... Really helpfull

    ReplyDelete
  3. Really Good one
    and very clear explanations!!!

    ReplyDelete
  4. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Hadoop Admin Online Training India

    ReplyDelete
  5. thakyou it vry nice blog for beginners
    https://www.emexotechnologies.com/courses/big-data-analytics-training/big-data-hadoop-training/

    ReplyDelete
  6. It is really a great work and the way in which you are sharing the knowledge is excellent.

    big data company in chennai

    ReplyDelete
  7. As we know, AWS big data consultant is the future of the industries these days, this article helps me to figure out which language I need to learn to pursue the future in this field.

    ReplyDelete
  8. Indexing In Hive >>>>> Download Now

    >>>>> Download Full

    Indexing In Hive >>>>> Download LINK

    >>>>> Download Now

    Indexing In Hive >>>>> Download Full

    >>>>> Download LINK gF

    ReplyDelete

Post a Comment