Web Log Analysis using Hive with Regex

Web Log Analysis using Hive with Regex

           A Server log is a log file automatically created and maintained by a server consists of a list of activities it performed. A typical example is a web server log which maintains a history of page requests. 
            The W3C maintains a standard format (the Common Log Format) for web server log files. Information about the request, including client IP address, request date/time, the page requested, HTTP code, byte served, user agent and referrer are typically added. This data can be combined into a single file, or separated into distinct logs, such as an access logerror log, or referrer log.

Hive is a data warehouse infrastructure that provides data summarization and ad-hoc querying. Hive provides an SQL dialect, called Hive Query Language (HQL) for querying data stored in a Hadoop cluster.
          Hive’s data model provides a high-level, table-like structure on top of HDFS.

Web Log Format:-
64.242.88.10 - - [07/Mar/2014:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
Ipaddress %h (64.242.88.10) ip address of the client (hostname).

Logname %l : (-) The “hyphen” in the output indicates that the requested piece of information is not available.

Userid %u : (-) This is the userid of the person request the document as determined by the HTTP authentication.  "–" present then the requested information is not available NA

Timestamp %t : [07/Mar/2014:16:20:55 -0800] time at which server finished processing request.
                      The format is
                         [day/month/year:hour:minute:second zone]
                        day = 2*digit
                         month = 3*letter
                         year = 4*digit
                         hour = 2*digit
                        minute = 2*digit
                        second = 2*digit
                        zone = (`+’ | `-‘) 4*digit

Request %r : "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" request made by client. Denoted by “GET”

Status code %s : 200 is the HTTP status code returned to the client. 
2xx is a successful response, 
3xx a redirection, 
4xx a client error,
5xx a server error.

Size of Object %b : 5253 is the size of the object returned to the client, measured in bytes.


Regular Expression Regex:-

           Refer the link to learn Regular Expression Regular Expression tutorial

Below regular expression, we will use for log pattern matching.

Download Input log_access file

In the below HiveQL script, we are using RegexSerDe class to process the log file with the help of above regular expression.

Execution of the Hive script logprochiveregex.hql
[hduser@localhost bin]$ hive -f /home/hduser/HIVE/logprochiveregex.hql

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/hadoop-2.6.0/hive/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
OK
Time taken: 8.815 seconds
OK
Time taken: 1.496 seconds
Loading data to table default.log_processing
OK
Time taken: 2.1 seconds
OK
# col_name             data_type            comment             
    
ipaddress            string                                   
logname              string                                   
userid               string                                   
time                 string                                   
request              string                                   
status               string                                   
size                 string                                   
    
# Detailed Table Information    
Database:            default               
Owner:               hduser                
CreateTime:          Fri Mar 24 09:32:53 PDT 2017  
LastAccessTime:      UNKNOWN               
Retention:           0                     
Location:            hdfs://localhost:9000/user/hive/warehouse/log_processing  
Table Type:          MANAGED_TABLE         
Table Parameters:    
 numFiles             1                   
 numRows              0                   
 rawDataSize          0                   
 totalSize            174447              
 transient_lastDdlTime 1490373175          
    
# Storage Information    
SerDe Library:       org.apache.hadoop.hive.serde2.RegexSerDe  
InputFormat:         org.apache.hadoop.mapred.TextInputFormat  
OutputFormat:        org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat  
Compressed:          No                    
Num Buckets:         -1                    
Bucket Columns:      []                    
Sort Columns:        []                    
Storage Desc Params:    
 input.regex          ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)
 output.format.string %1$s %2$s %3$s %4$s %5$s %6$s %7$s
 serialization.format 1                   
Time taken: 0.861 seconds, Fetched: 37 row(s)
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hduser_20170324093243_e4985fc8-d3d5-4eda-8b47-80a7e1911b63
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1490354948174_0003, Tracking URL = http://localhost:8088/proxy/application_1490354948174_0003/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1490354948174_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-03-24 09:36:07,239 Stage-1 map = 0%,  reduce = 0%
2017-03-24 09:37:07,594 Stage-1 map = 0%,  reduce = 0%
2017-03-24 09:38:08,660 Stage-1 map = 0%,  reduce = 0%
2017-03-24 09:39:03,514 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 41.82 sec
2017-03-24 09:39:05,651 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 42.37 sec
2017-03-24 09:40:06,470 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 45.91 sec
MapReduce Total cumulative CPU time: 45 seconds 910 msec
Ended Job = job_1490354948174_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 45.91 sec   HDFS Read: 184082 HDFS Write: 1423 SUCCESS
Total MapReduce CPU Time Spent: 45 seconds 910 msec
OK
64.242.88.10 - - [07/Mar/2014:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2014:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523
64.242.88.10 - - [07/Mar/2014:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2014:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2014:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
64.242.88.10 - - [07/Mar/2014:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore&param1=1.12&param2=1.12 HTTP/1.1" 200 11382
64.242.88.10 - - [07/Mar/2014:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
64.242.88.10 - - [07/Mar/2014:16:29:16 -0800] "GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2014:16:30:29 -0800] "GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2014:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
Time taken: 432.103 seconds, Fetched: 10 row(s)

Comments

  1. Is this only possible with Hive,HQL and hadoop.what if I am using hibernate or any other ORM and any other SQL database

    ReplyDelete
    Replies
    1. Web Log Analysis Using Hive With Regex >>>>> Download Now

      >>>>> Download Full

      Web Log Analysis Using Hive With Regex >>>>> Download LINK

      >>>>> Download Now

      Web Log Analysis Using Hive With Regex >>>>> Download Full

      >>>>> Download LINK 6J

      Delete
  2. I don't have any idea about Hibernate and other ORM. But you can process your log file using Python also.

    ReplyDelete
  3. [07/Mar/2014:16:20:55 -0800] like this there are so many in year right.
    how to apply a query on this to retrive the data of only march,or for a particular date of the week

    ReplyDelete
  4. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Pratik Shekhar
    MaxMunus
    E-mail: pratik@maxmunus.com
    Ph:(0) +91 9066268701
    http://www.maxmunus.com/

    ReplyDelete
  5. Excellent Blog very imperative good content, this article is useful to beginners and real time
    employees.Thank u for sharing...
    Hadoop Training in Hyderabad

    ReplyDelete
  6. Great and interesting article to read.. i Gathered more useful and new information from this article.thanks a lot for sharing this article to us..

    big data training in Velachery | big data hadoop training

    ReplyDelete
  7. Thanks for sharing. Nice Blog! Thanks for sharing valuable information with us.
    Big Data Hadoop Online Training

    ReplyDelete
  8. Thanks For Sharining..A good Information..This is a nice Blog Keep Sharining This Type Of Information..
    Hadoop Online Training In Hyderabad

    ReplyDelete
  9. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this. Big Data Training Institute in Chennai

    ReplyDelete
  10. It is nice blog Thank you porovide importent information and i am searching for same information to save my timeBig data hadoop online training India

    ReplyDelete
  11. Hi,
    Thanks For Sharing a great Information..This is a nice Blog Keep updating us by sharing latest information...
    Thank you
    Hariprasad

    Hadoop Training In Hyderabad

    ReplyDelete
  12. Hello,
    An in-depth knowledge of a Hadoop Administration project ensures all the critical components are well-covered. With this knowledge, you can increase your visibility and enhance your efficiency in drawing real connections among different components of Hadoop.

    ReplyDelete
  13. Hello Snehal your blog is very nice and knowledgeable. Keep sharing these type of post and thanks for sharing this one.

    Big Data Analyst | Spark and Hadoop HDFS | Apache Spark

    ReplyDelete
  14. Hi, It is really nice to see the best blog for HadoopTutorial .This blog helped me a lot easily understandable too.And i learnt about Agile Methodology too... Hadoop Training in Velachery | Hadoop Training .

    ReplyDelete
  15. Wonderful blog. It was a great content. Thanks for posting such a great information.

    hadoop training in pune
    hadoop spark classes in pune
    hadoop testing
    hadoop pune

    ReplyDelete

  16. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it...
    AngularJs Training in Chennai OMR | ReactJs Training in Chennai OMR | Node.Js Training in Chennai OMR

    ReplyDelete
  17. It is really a great work and the way in which you are sharing the knowledge is excellent.

    big data analytics company in hyderabad

    ReplyDelete
  18. This is best practiced for using user generated content and having right article to see you here and thanks a lot for sharing with us.
    https://www.bharattaxi.com

    ReplyDelete
  19. Data Science

    Data science refers to complex processing techniques on statistical data for mining purpose, commons techniques are regression, classification etc. Machine learning algorithms would be covered as a part of this syllabus.
    Visit - https://www.etlhive.com/course/data-science-training-pune-and-data-analytics-training-pune/

    ReplyDelete
  20. Really very informative and creative contents. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up.
    Big data hadoop training in mumbai
    Big data hadoop training in mumbai

    ReplyDelete
  21. At Superfastprocessing, we use a range of servers with high-fault tolerance and equipped with load balancers. The load balancers ensure high availability of servers at all times.

    ReplyDelete
  22. Good to know about the email list business. I was looking for such a service for a long time o grow my local business but the rates that other companies were offering were not satisfactory. Thanks for sharing the recommendations in this post.hadoop training in bangalore

    ReplyDelete
  23. Good Blog, well descrided, Thanks for sharing this information.
    Spark and Scala Online Training

    ReplyDelete
  24. Nice content it has usefull information

    https://snehalthakur.blogspot.com/2017/03/sdlcagile-methodolgy.html?showComment=1583927758809#c1191429501244426371

    ReplyDelete
  25. I like your post there is a lot of information about software testing, which i would like to learn, thank you for the great guide. Very useful post and I think it is rather easy to see from the other comments as well that this post is well written and useful. I bookmarked this blog a while ago because of the useful content and I am never being disappointed. Keep up the good work.. Read more about QA Services

    ReplyDelete
  26. You there, this is really good post here. Thanks for taking the time to post such valuable information. Quality content is what always gets the visitors coming. Data Blending in Tableau

    ReplyDelete
  27. very nice post snehal.
    This post is really good and it's very helpful for me.
    Thank you.

    big data training
    big data course
    big data and hadoop training
    big data hadoop certification

    ReplyDelete
  28. This comment has been removed by the author.

    ReplyDelete
  29. Hi to everybody, here everyone is sharing such knowledge, so it’s fastidious to see this site, and I used to visit this blog daily
    data scientist course malaysia

    ReplyDelete
  30. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    business analytics course
    data analytics courses hyderabad
    data science training

    ReplyDelete
  31. I am impressed by the information that you have on this blog. It shows how well you understand this subject.

    data science course in guwahati

    ReplyDelete
  32. Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic about analysis process. If possible, as you gain expertise, would you mind updating your blog with extra information? It is extremely helpful for me.
    DevOps Training in Chennai

    DevOps Online Training in Chennai

    DevOps Training in Bangalore

    DevOps Training in Hyderabad

    DevOps Training in Coimbatore

    DevOps Training

    DevOps Online Training

    ReplyDelete
  33. very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.


    python training in bangalore

    python training in hyderabad

    python online training

    python training

    python flask training

    python flask online training

    python training in coimbatore


    ReplyDelete
  34. On this count, Python scores far better than JavaScript. It is designed to be as beginner-friendly as possible and uses simple variables and functions. JavaScript is full of complexities like class definitions. When it comes to ease of learning, Python is the clear winner.thanks lot!!

    Android Training in Chennai

    Android Online Training in Chennai

    Android Training in Bangalore

    Android Training in Hyderabad

    Android Training in Coimbatore

    Android Training

    Android Online Training

    ReplyDelete
  35. Very nice article,keep sharing more information with us.

    thank you.....

    big data online training

    ReplyDelete
  36. I would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well. In fact your creative writing abilities has inspired me to start my own Blog Engine blog now. Really the blogging is spreading its wings rapidly. Your write up is a fine example of it.
    360digitmg

    ReplyDelete
  37. We are the leading Blockchain Development Company headquartered in the USA. We follow a dynamic approach to deliver the most innovative and robust Blockchain-based applications. Moreover, We are continuously updated with the technologies emerging in the market, and adopting them quickly as well is what makes us stand out and the best.

    ReplyDelete

  38. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
    data science training in ecil

    ReplyDelete
  39. I have no words for this amazing post. Really good information. As we know mobile app industry is increasing day by day. People are planning to move into mobile technology, but they are not sure about the best mobile app development framework. In this case, I want you to get the perfect knowledge of app framework. Thanks!

    ReplyDelete

  40. I know this is an amazing post, it defines the true value of your knowledge. In fact, running a business is not common. People keep running to drive more business and generate more customers. At RisingMax which is best IT consulting companies in NYC, you can maintain a leading position with real estate software development in New York. keep it up. I really think this article is amazing, I can't describe it in words. Also, if you need an automotive software development service, do not delay in shaking hands with RisingMax.

    ReplyDelete
  41. You made such an interesting piece to read, giving every subject enlightenment for us to gain knowledge. Thanks for sharing the such information with us to read this... Alves Anus Dybala Dina Mika Mitali Manik Luis Eric Marlisa

    ReplyDelete
  42. This is a really authentic and informative blog. Share more posts like this.
    Phonetics Sounds With Examples
    Basics Of Phonetics

    ReplyDelete
  43. Thank you for the useful information. Share more updates.
    Idioms
    Speaking Test



    ReplyDelete
  44. Thanks for this blog, This blog contains more useful Information...
    What Is MERN
    What Is MERN Stack Used For

    ReplyDelete
  45. This is really very nice post you shared, i like the post, thanks for sharing..
    full stack web development course in malaysia

    ReplyDelete
  46. Web Log Analysis Using Hive With Regex >>>>> Download Now

    >>>>> Download Full

    Web Log Analysis Using Hive With Regex >>>>> Download LINK

    >>>>> Download Now

    Web Log Analysis Using Hive With Regex >>>>> Download Full

    >>>>> Download LINK Yi

    ReplyDelete
  47. Professor laugh form now beat drug. Record ago data little teach though.entertainment

    ReplyDelete
  48. Thank you for providing information about CMA Coaching Centers in Hyderabad
    CMA Coaching Centres in Hyderabad

    ReplyDelete
  49. Your blog on Web Log Analysis using Hive with Regex is an excellent resource! Your step-by-step guide to performing web log analysis using Hive with regular expressions is incredibly informative and practical.
    The Ultimate Data Analytics Training Course- Hands-On Learning and Real-World Applications

    ReplyDelete
  50. This article is helpful to newcomers and working professionals alike. Excellent blog with really important solid stuff.You're welcome for sharing.
    Colleges in Hyderabad For BBA

    ReplyDelete
  51. This article is helpful to newcomers and working professionals alike. Excellent blog with really important solid stuff.You're welcome for sharing. React JS Training in chennai

    ReplyDelete
  52. This article is helpful to newcomers and working professionals alike. Excellent blog with really important solid stuff.You're welcome for sharing < a href="https://www.credosystemz.com/training-in-chennai/best-selenium-training-in-chennai/"> Selenium Training in chennai

    ReplyDelete

Post a Comment