SlideShare a Scribd company logo
Replacing Telco DB/DW to
                           Hadoop and Hive


                                JunHo Cho

                          Data Analysis Platform Team




Friday, July 1, 2011
•   Cloud Computing Platform - Xen

                   •   Cloud Storage Platform - hadoop

                   •   Massive Email Archiving Solution - hadoop, lucene

                       •   HIVE : social network analysis using email

                   •   Log Archiving Solution - hadoop



                   •   Data Analysis
                              data mining, machine learning, data statistic

                   •   Data Platform - hadoop, lucene, hive

                   •   Cloud Architecture - KT Cloud

Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data


Friday, July 1, 2011
Telco Data


Friday, July 1, 2011
Telco Data

Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource


                       Storage & Computing




Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource




          Collection

Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource

             Search




Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource


                                    Analysis




Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource




                         Coordination

Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Hive Internal



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver   select col1 from tab1 where ...


                       DDL           HQL
                                                    Execution
                                              Works
                                                     Engine
            MetaStore             Compiler
                            ORM                       Hadoop
                                               Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture
                                    a 123344
                                    b 121211
                                    c 342434

                       UI         Driver

                       DDL            HQL
                                                     Execution
                                               Works
                                                      Engine
            MetaStore             Compiler
                            ORM                       Hadoop
                                               Result



Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR


                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5




Friday, July 1, 2011
Parser
                       Parser
                                                     Select col1,col2 From tab1 Where col3 > 5

                                                QB
                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR


                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5




Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
              QB tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5




Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5

                                   QB     insclause-0



Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                       col1 QB
                                                                                               TOK_TABLE_OR_COL    5

                                          insclause-0



Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL


                                                       col1                      col2           QB
                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5

                                          insclause-0



Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE   QB
           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL


                                                       col1                      col2
                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL         5

                                          insclause-0



Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM

            TOK_WHERE

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT                             SelectOperator

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT                             SelectOperator

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT                             SelectOperator

            TOK_DESTINATION                       FileSinkOperator




Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5




                              TableScanOperator

                                FilterOperator

                                SelectOperator

                               FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}



                              TableScanOperator

                                FilterOperator

                                SelectOperator

                               FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator

      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}
                                     Context




 TableScanOperator

       FilterOperator
                                                     ColumnPruner


      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}
                                     Context




 TableScanOperator

       FilterOperator                                                 FIL
                                                     ColumnPruner     TS
                                                                      SEL
      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator                                                   FIL
                                                       ColumnPruner     TS
                                                                        SEL
      SelectOperator

    FileSinkOperator          Context




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                       ColumnPruner


      SelectOperator
                                        FIL
    FileSinkOperator          Context   TS
                                        SEL




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                       ColumnPruner

                                        FIL
      SelectOperator          Context   TS
                                        SEL

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                           ColumnPruner

                                        FIL
      SelectOperator          Context   TS
                                        SEL   col1, col2

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                       ColumnPruner

                                        FIL
      SelectOperator          Context   TS
                                        SEL

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

                                        FIL   col1, col2, col3
       FilterOperator         Context   TS
                                                                 ColumnPruner
                                        SEL

      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

                                        FIL
       FilterOperator         Context   TS
                                                       ColumnPruner
                                        SEL

      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}


                                        FIL
 TableScanOperator            Context   TS    col1, col2, col3
                                        SEL

          FilterOperator
                                                                 ColumnPruner


       FilterOperator

      SelectOperator

    FileSinkOperator

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5

                                                       TS - GenMRTableScan1
                                     TaskFactory
                                                       FS - GenMRFileSink1
                       QB




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5

                                                       TS - GenMRTableScan1
                                     TaskFactory
                                                       FS - GenMRFileSink1
                       QB




                                                       FetchTask




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5

                                                         TS - GenMRTableScan1
                                       TaskFactory
                                                         FS - GenMRFileSink1
                       QB



             TableScanOperator



                FilterOperator                           FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5

                                                         TS - GenMRTableScan1
                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB



                                     TableScanOperator



                FilterOperator                           FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                FilterOperator                           FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                                     FilterOperator      FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                                     FilterOperator      FetchTask

                                      FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                                     FilterOperator      FetchTask

                                      FilterOperator



                                      SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5


                                      TaskFactory
                                                       FS - GenMRFileSink1
                       QB
                                    MapRedTask


                                   TableScanOperator



                                   FilterOperator      FetchTask

                                    FilterOperator



                                    SelectOperator



                                   FileSinkOperator




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5


                                      TaskFactory

                       QB
                                    MapRedTask


                                   TableScanOperator



                                   FilterOperator      FetchTask

                                    FilterOperator



                                    SelectOperator



                                   FileSinkOperator




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5


                                      TaskFactory

                       QB
                                    MapRedTask
                                                       MapRedTask
                                   TableScanOperator



                                   FilterOperator       FetchTask

                                    FilterOperator



                                    SelectOperator



                                   FileSinkOperator




Friday, July 1, 2011
Hive Internal
                                                                Map Reduce
               Web UI       Hive CLI      JDBC
                                                  TSOperator                     User Script
                       Browse, Query, DDL
                                                                                    UDF
                                                  FILOperator    SELOperator


        MetaStore                      Hive QL    FILOperator     FSOperator

         Thrift API                    Parser             ExecMapper/ExecReducer
                                        Plan                      SerDe

                                   Optimizer              Input/OutputFormat

                                        Task
                                                     HDFS            StorageHandler
                                                      RCFile
                                                                   DB      ...       HBase

Friday, July 1, 2011
Hive Internal
                                                                Map Reduce
               Web UI       Hive CLI      JDBC
                                                  TSOperator                     User Script
                       Browse, Query, DDL
                                                                                    UDF
                                                  FILOperator    SELOperator


        MetaStore                      Hive QL    FILOperator     FSOperator

         Thrift API                    Parser             ExecMapper/ExecReducer
                                        Plan                      SerDe

                                   Optimizer              Input/OutputFormat

                                        Task
                                                     HDFS            StorageHandler
                                                      RCFile
                                                                   DB      ...       HBase

Friday, July 1, 2011
Oracle Migration
                            to Hive



Friday, July 1, 2011
l	 
             l	 

             l	       	 

             l	        	 




Friday, July 1, 2011
l	                     l	 
             l	                     l	    	 
             l	       	             l	 
             l	        	            l	    	  	    	    	 




Friday, July 1, 2011
l	                     l	 
             l	                     l	    	 
             l	       	             l	 
             l	        	            l	    	  	         	         	 


                                                    	 
                                                    	 
                                                    	         	 
Friday, July 1, 2011
Understand Oracle SQL


                       • more than 3000 ETL SQL
                       • understand Data-Flow
                       • Group similar SQL Pattern
                       • Investigate used Oracle Function


Friday, July 1, 2011
Oracle SQL



Friday, July 1, 2011
Data Model Convert




Friday, July 1, 2011
Data Model Convert



                       Table




Friday, July 1, 2011
Data Model Convert



                       Table           Table




Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition




Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition       Partition




Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition       Partition


                       Sampling



Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition       Partition


                       Sampling         Bucket



Friday, July 1, 2011
DataType Convert




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)      FLOAT/DOUBLE




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)      FLOAT/DOUBLE

                  VARCHAR2




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)      FLOAT/DOUBLE

                  VARCHAR2         STRING




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)            TINYINT
                                    INT/BIGINT

               NUMBER(n,m)         FLOAT/DOUBLE

                  VARCHAR2            STRING

                       DATE


Friday, July 1, 2011
DataType Convert


                 NUMBER(n)              TINYINT
                                      INT/BIGINT

               NUMBER(n,m)          FLOAT/DOUBLE

                  VARCHAR2               STRING

                       DATE               STRING
                                   “yyyy-MM-dd HH:mm:ss” format



Friday, July 1, 2011
HIVE DML

                       • HIVE supports ANSI-SQL
                       • Only Support Sub-Queries in FROM clause
                       • Join query : equi-join/inner-join
                                   outer-join
                                   self-join




Friday, July 1, 2011
IN Clause




Friday, July 1, 2011
IN Clause
             IN SubQuery




Friday, July 1, 2011
IN Clause
             IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              IN(SELECT d.DeptNo FROM Dept d)




Friday, July 1, 2011
IN Clause
             IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              IN(SELECT d.DeptNo FROM Dept d)




              SELECT * from Employee e

              LEFT SEMI JOIN                     Dept d   ON   (e.DeptNo=d.DeptNo)




Friday, July 1, 2011
NOT IN Clause




Friday, July 1, 2011
NOT IN Clause
             NOT IN SubQuery




Friday, July 1, 2011
NOT IN Clause
             NOT IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              NOT IN(SELECT               d.DeptNo FROM Dept d)




Friday, July 1, 2011
NOT IN Clause
             NOT IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              NOT IN(SELECT                d.DeptNo FROM Dept d)




              SELECT e.* from Employee e

              LEFT OUTER JOIN Dept d ON                    (e.DeptNo=d.DeptNo)

              WHERE d.DeptNo IS NULL



Friday, July 1, 2011
JOIN Operator




Friday, July 1, 2011
JOIN Operator
              JOIN




Friday, July 1, 2011
JOIN Operator
              JOIN
              SELECT *

              FROM       Employee e1, Dept d1   WHERE   e1.ID = d1.Id




Friday, July 1, 2011
JOIN Operator
              JOIN
              SELECT *

              FROM       Employee e1, Dept d1   WHERE   e1.ID = d1.Id




              SELECT *

              FROM Employee e1 JOIN        Dept d1   ON (e1.ID   = d1.Id   )


Friday, July 1, 2011
Oracle Function



Friday, July 1, 2011
Functions




Friday, July 1, 2011
Functions


            Math Function
                        round,ceil,mod,
                       power,sqrt,sin/cos




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function
                       substr,trim,lpad/rpad
                        ltrim/rtrim,replace




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace



             NULL Function
                        coalesce,nvl,nvl2




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace



             NULL Function                         NULL Function
                        coalesce,nvl,nvl2                     coalesce




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace



             NULL Function                         NULL Function
                        coalesce,nvl,nvl2                     coalesce

                                                    No NVL,NVL2
Friday, July 1, 2011
Custom UDF Function
                       •   Condition Function

                           •   DECODE, GREATEST

                       •   Null Comparison Function

                           •   NVL / NVL2

                       •   Type Conversion

                           •   TO_NUMBER

                           •   TO_CHAR

                           •   TO_DATE

                           •   INSTR4

                           •   DATE_FORMAT

                           •   LAST_DAY


Friday, July 1, 2011
Oracle Analytic
                          Function



Friday, July 1, 2011
Analytic Function




Friday, July 1, 2011
Analytic Function
     RANK




Friday, July 1, 2011
Analytic Function
     RANK
      SELECT name,dept,salary,RANK()   OVER (PARTITION BY   dept
      ORDER BY         salary   DESC) FROM   emp




Friday, July 1, 2011
Analytic Function
     RANK
      SELECT name,dept,salary,RANK()     OVER (PARTITION BY     dept
      ORDER BY         salary   DESC) FROM      emp




      SELECT e.name,e.dept,e.salary,RANK(      e.dept,e.salary)
      FROM (SELECT name,        dept, salary   FROM   empDISTRIBUTED
      BY dept SORT BY           dept, salary           DESC) e



Friday, July 1, 2011
Analytic Function
     RANK
      SELECT name,dept,salary,RANK()     OVER (PARTITION BY     dept
      ORDER BY         salary   DESC) FROM      emp




    RANK(arg1,arg2) - Custom UDF
      SELECT e.name,e.dept,e.salary,RANK(      e.dept,e.salary)
      FROM (SELECT name,        dept, salary   FROM   empDISTRIBUTED
      BY dept SORT BY           dept, salary           DESC) e



Friday, July 1, 2011
Analytic Aggregation Function




Friday, July 1, 2011
Analytic Aggregation Function
      MIN




Friday, July 1, 2011
Analytic Aggregation Function
      MIN
      SELECT dept,           MIN(salary) OVER (PARTITION BY   dept)
      FROM             emp




Friday, July 1, 2011
Analytic Aggregation Function
      MIN
      SELECT dept,           MIN(salary) OVER (PARTITION BY       dept)
      FROM             emp




      SELECT dept,tmp.m         FROM emp JOIN (SELECT       dept, MIN(salary) m
      FROM emp           GROUP BY dept) tmp ON emp.dept =   tmp.dept




Friday, July 1, 2011
Analytic Aggregation Function
      MIN
      SELECT dept,           MIN(salary) OVER (PARTITION BY       dept)
      FROM             emp




      Aggregation + JOIN
      SELECT dept,tmp.m         FROM emp JOIN (SELECT       dept, MIN(salary) m
      FROM emp           GROUP BY dept) tmp ON emp.dept =   tmp.dept




Friday, July 1, 2011
Hive Internal



Friday, July 1, 2011
Merge Join Tree Bug

                       • select * from a join b on a.v1 = b.v1
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1
                         join e on a.v2 = e.v2


                       • select * from a join e on a.v2 = e.v2
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1
                         join b on a.v1 = b.v1


Friday, July 1, 2011
Merge Join Tree Bug

                       • select * from a join b on a.v1 = b.v1
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1          educ e #3
                                                  M a pR
                         join e on a.v2 = e.v2


                       • select * from a join e on a.v2 = e.v2
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1
                         join b on a.v1 = b.v1


Friday, July 1, 2011
Merge Join Tree Bug

                       • select * from a join b on a.v1 = b.v1
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1          educ e #3
                                                  M a pR
                         join e on a.v2 = e.v2


                       • select * from a join e on a.v2 = e.v2
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1           duc e #2
                                                  Ma pRe
                         join b on a.v1 = b.v1


Friday, July 1, 2011
Merge Join Tree Bug Fix
                       • SemanticAnalyzer
                          private void mergeJoinTree(QB qb) {


                             QBJoinTree root = qb.getQbJoinTree();
                             QBJoinTree parent = null;
                             while (root != null) {
                                 boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());

                                 if (parent == null) {
                                       if (merged) {
                                           root = qb.getQbJoinTree();
                                       } else {
                                           parent = root;
                                           root = root.getJoinSrc();
                                       }

                                  } else {
                                     parent = parent.getJoinSrc();
                                     root = parent.getJoinSrc();
                                  }




Friday, July 1, 2011
Merge Join Tree Bug Fix
                       • SemanticAnalyzer
                          private void mergeJoinTree(QB qb) {


                             QBJoinTree root = qb.getQbJoinTree();
                             QBJoinTree parent = null;
                             while (root != null) {
                                 boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());

                                 if (parent == null) {
                                       if (merged) {
                                           root = qb.getQbJoinTree();
                                       } else {
                                           parent = root;
                                           root = root.getJoinSrc();
                                       }

                                  } else {
                                 } else {
                                   if parent = parent.getJoinSrc();
                                        (merged) {
                                       root = parent.getJoinSrc();
                                        root = qb.getQbJoinTree();
                                  } } else {
                                        parent = parent.getJoinSrc();
                                        root = parent.getJoinSrc();
                                    }
                                 }

Friday, July 1, 2011
New HQL Syntax




Friday, July 1, 2011
New HQL Syntax
      INSERT INTO




Friday, July 1, 2011
New HQL Syntax
      INSERT INTO
      INSERT INTO table VALUES(col1 ... coln)
      SELECT ... FROM tmp ...




Friday, July 1, 2011
New HQL Syntax
      INSERT INTO
      INSERT INTO table VALUES(col1 ... coln)
      SELECT ... FROM tmp ...

          • INSERT [OVERWRITE] destination
           • grammar
           • modify FileSinkPlan
          • New Feature - HIVE-306
           • INSERT INTO destination
Friday, July 1, 2011
Tuning




Friday, July 1, 2011
Tuning
              • Hadoop Tunning




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size

              • Hive Tunning


Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size

              • Hive Tunning
                  •    hive.input.format = CombineHiveInputFormat




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size

              • Hive Tunning
                  •    hive.input.format = CombineHiveInputFormat

                  •    query tuning - reduce # of MapReduce
                                      using HQL Plan

Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf
                  - join + udf (aggregation)




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf
                  - join + udf (aggregation)
                Modify internal hive



Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf
                  - join + udf (aggregation)
                Modify internal hive
                Hadoop + Hive Tunning


Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Question ?



Friday, July 1, 2011

More Related Content

PDF
YOU WILL REGRET THIS
PDF
Messaging patterns
PDF
Dev-Time Liferay
PDF
Hive Performance Monitoring Tool
PDF
Integrate Hive and R
PDF
20081030linkedin
PPT
Hadoop Summit 2009 Hive
PPTX
Internal Hive
YOU WILL REGRET THIS
Messaging patterns
Dev-Time Liferay
Hive Performance Monitoring Tool
Integrate Hive and R
20081030linkedin
Hadoop Summit 2009 Hive
Internal Hive

Viewers also liked (20)

PPT
HIVE: Data Warehousing & Analytics on Hadoop
PDF
Hive Quick Start Tutorial
PPTX
Data Discovery on Hadoop - Realizing the Full Potential of your Data
PPT
Hadoop Hive Talk At IIT-Delhi
PPT
Hive introduction 介绍
PDF
User-Defined Table Generating Functions
PPTX
Datacubes in Apache Hive at ApacheCon
PPTX
Advanced topics in hive
PPT
Hive - SerDe and LazySerde
PPTX
An intriduction to hive
PPTX
Ten tools for ten big data areas 04_Apache Hive
PPT
Hive ICDE 2010
PDF
Data Engineering with Spring, Hadoop and Hive
PPTX
October 2014 HUG : Hive On Spark
PPTX
Hive analytic workloads hadoop summit san jose 2014
PPTX
Introduction to Big Data processing (FGRE2016)
PDF
20081009nychive
PPT
2008 Ur Tech Talk Zshao
PPT
Hive Object Model
PPT
Hive Apachecon 2008
HIVE: Data Warehousing & Analytics on Hadoop
Hive Quick Start Tutorial
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Hadoop Hive Talk At IIT-Delhi
Hive introduction 介绍
User-Defined Table Generating Functions
Datacubes in Apache Hive at ApacheCon
Advanced topics in hive
Hive - SerDe and LazySerde
An intriduction to hive
Ten tools for ten big data areas 04_Apache Hive
Hive ICDE 2010
Data Engineering with Spring, Hadoop and Hive
October 2014 HUG : Hive On Spark
Hive analytic workloads hadoop summit san jose 2014
Introduction to Big Data processing (FGRE2016)
20081009nychive
2008 Ur Tech Talk Zshao
Hive Object Model
Hive Apachecon 2008
Ad

Similar to Replacing Telco DB/DW to Hadoop and Hive (20)

PDF
Intro to App Engine - Agency Dev Day NYC 2011
PDF
Distribute the workload, PHPTek, Amsterdam, 2011
PDF
Ruby and Rails, as secret weapon to build your service-oriented apps
PDF
2011 july-gtug-high-replication-datastore
PDF
Writing a Crawler with Python and TDD
PDF
Governing services, data, rules, processes and more
PDF
THE STATE OF GLOBAL INFRASTRUCTURE PERFORMANCE from Structure 2012
PDF
Distribute the workload, PHP Barcelona 2011
PDF
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
PDF
Brisk: more powerful Hadoop powered by Cassandra
PDF
Open a window, see the clouds - php|tek 2011
PDF
2011 june-kuala-lumpur-gtug-hackathon
PDF
Continuous Deployment at Disqus (Pylons Minicon)
PDF
PyconUA - How to build ERP application having fun?
PDF
David Mytton, Boxed Ice
PDF
Read the Docs: A completely open source Django project
PDF
Scale like an ant, distribute the workload - DPC, Amsterdam, 2011
PDF
Top Interview Questions for MAANG Data Science & AI Roles
PDF
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
PDF
Sera que?
Intro to App Engine - Agency Dev Day NYC 2011
Distribute the workload, PHPTek, Amsterdam, 2011
Ruby and Rails, as secret weapon to build your service-oriented apps
2011 july-gtug-high-replication-datastore
Writing a Crawler with Python and TDD
Governing services, data, rules, processes and more
THE STATE OF GLOBAL INFRASTRUCTURE PERFORMANCE from Structure 2012
Distribute the workload, PHP Barcelona 2011
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
Brisk: more powerful Hadoop powered by Cassandra
Open a window, see the clouds - php|tek 2011
2011 june-kuala-lumpur-gtug-hackathon
Continuous Deployment at Disqus (Pylons Minicon)
PyconUA - How to build ERP application having fun?
David Mytton, Boxed Ice
Read the Docs: A completely open source Django project
Scale like an ant, distribute the workload - DPC, Amsterdam, 2011
Top Interview Questions for MAANG Data Science & AI Roles
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
Sera que?
Ad

Recently uploaded (20)

PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Replacing Telco DB/DW to Hadoop and Hive

  • 1. Replacing Telco DB/DW to Hadoop and Hive JunHo Cho Data Analysis Platform Team Friday, July 1, 2011
  • 2. Cloud Computing Platform - Xen • Cloud Storage Platform - hadoop • Massive Email Archiving Solution - hadoop, lucene • HIVE : social network analysis using email • Log Archiving Solution - hadoop • Data Analysis data mining, machine learning, data statistic • Data Platform - hadoop, lucene, hive • Cloud Architecture - KT Cloud Friday, July 1, 2011
  • 20. OpenSource Storage & Computing Friday, July 1, 2011
  • 22. OpenSource Collection Friday, July 1, 2011
  • 24. OpenSource Search Friday, July 1, 2011
  • 26. OpenSource Analysis Friday, July 1, 2011
  • 28. OpenSource Coordination Friday, July 1, 2011
  • 34. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 35. Hive Architecture UI Driver select col1 from tab1 where ... DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 36. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 37. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 38. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 39. Hive Architecture a 123344 b 121211 c 342434 UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 40. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 41. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 42. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 Friday, July 1, 2011
  • 43. Parser Parser Select col1,col2 From tab1 Where col3 > 5 QB TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 Friday, July 1, 2011
  • 44. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR QB tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 Friday, July 1, 2011
  • 45. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 QB insclause-0 Friday, July 1, 2011
  • 46. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE col1 QB TOK_TABLE_OR_COL 5 insclause-0 Friday, July 1, 2011
  • 47. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 QB TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0 Friday, July 1, 2011
  • 48. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE QB TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0 Friday, July 1, 2011
  • 49. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 50. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 51. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB Friday, July 1, 2011
  • 52. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TOK_WHERE TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 53. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 54. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 55. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 56. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 57. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION Friday, July 1, 2011
  • 58. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION Friday, July 1, 2011
  • 59. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION FileSinkOperator Friday, July 1, 2011
  • 60. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 61. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 62. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 TableScanOperator FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 63. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 64. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 65. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator ColumnPruner SelectOperator FileSinkOperator Friday, July 1, 2011
  • 66. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperator Friday, July 1, 2011
  • 67. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperator Context Friday, July 1, 2011
  • 68. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner SelectOperator FIL FileSinkOperator Context TS SEL Friday, July 1, 2011
  • 69. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperator Friday, July 1, 2011
  • 70. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL col1, col2 FileSinkOperator Friday, July 1, 2011
  • 71. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperator Friday, July 1, 2011
  • 72. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL col1, col2, col3 FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperator Friday, July 1, 2011
  • 73. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperator Friday, July 1, 2011
  • 74. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} FIL TableScanOperator Context TS col1, col2, col3 SEL FilterOperator ColumnPruner FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 75. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 76. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 77. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB Friday, July 1, 2011
  • 78. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB FetchTask Friday, July 1, 2011
  • 79. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 80. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 81. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 82. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 83. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 84. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 85. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 86. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 87. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 88. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 89. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 90. Oracle Migration to Hive Friday, July 1, 2011
  • 91. l l l l Friday, July 1, 2011
  • 92. l l l l l l l l Friday, July 1, 2011
  • 93. l l l l l l l l Friday, July 1, 2011
  • 94. Understand Oracle SQL • more than 3000 ETL SQL • understand Data-Flow • Group similar SQL Pattern • Investigate used Oracle Function Friday, July 1, 2011
  • 97. Data Model Convert Table Friday, July 1, 2011
  • 98. Data Model Convert Table Table Friday, July 1, 2011
  • 99. Data Model Convert Table Table Partition Friday, July 1, 2011
  • 100. Data Model Convert Table Table Partition Partition Friday, July 1, 2011
  • 101. Data Model Convert Table Table Partition Partition Sampling Friday, July 1, 2011
  • 102. Data Model Convert Table Table Partition Partition Sampling Bucket Friday, July 1, 2011
  • 104. DataType Convert NUMBER(n) Friday, July 1, 2011
  • 105. DataType Convert NUMBER(n) TINYINT INT/BIGINT Friday, July 1, 2011
  • 106. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) Friday, July 1, 2011
  • 107. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE Friday, July 1, 2011
  • 108. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 Friday, July 1, 2011
  • 109. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING Friday, July 1, 2011
  • 110. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATE Friday, July 1, 2011
  • 111. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATE STRING “yyyy-MM-dd HH:mm:ss” format Friday, July 1, 2011
  • 112. HIVE DML • HIVE supports ANSI-SQL • Only Support Sub-Queries in FROM clause • Join query : equi-join/inner-join outer-join self-join Friday, July 1, 2011
  • 114. IN Clause IN SubQuery Friday, July 1, 2011
  • 115. IN Clause IN SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d) Friday, July 1, 2011
  • 116. IN Clause IN SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d) SELECT * from Employee e LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo) Friday, July 1, 2011
  • 117. NOT IN Clause Friday, July 1, 2011
  • 118. NOT IN Clause NOT IN SubQuery Friday, July 1, 2011
  • 119. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d) Friday, July 1, 2011
  • 120. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d) SELECT e.* from Employee e LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo) WHERE d.DeptNo IS NULL Friday, July 1, 2011
  • 122. JOIN Operator JOIN Friday, July 1, 2011
  • 123. JOIN Operator JOIN SELECT * FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id Friday, July 1, 2011
  • 124. JOIN Operator JOIN SELECT * FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id SELECT * FROM Employee e1 JOIN Dept d1 ON (e1.ID = d1.Id ) Friday, July 1, 2011
  • 127. Functions Math Function round,ceil,mod, power,sqrt,sin/cos Friday, July 1, 2011
  • 128. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Friday, July 1, 2011
  • 129. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function substr,trim,lpad/rpad ltrim/rtrim,replace Friday, July 1, 2011
  • 130. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace Friday, July 1, 2011
  • 131. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function coalesce,nvl,nvl2 Friday, July 1, 2011
  • 132. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function NULL Function coalesce,nvl,nvl2 coalesce Friday, July 1, 2011
  • 133. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function NULL Function coalesce,nvl,nvl2 coalesce No NVL,NVL2 Friday, July 1, 2011
  • 134. Custom UDF Function • Condition Function • DECODE, GREATEST • Null Comparison Function • NVL / NVL2 • Type Conversion • TO_NUMBER • TO_CHAR • TO_DATE • INSTR4 • DATE_FORMAT • LAST_DAY Friday, July 1, 2011
  • 135. Oracle Analytic Function Friday, July 1, 2011
  • 137. Analytic Function RANK Friday, July 1, 2011
  • 138. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp Friday, July 1, 2011
  • 139. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) e Friday, July 1, 2011
  • 140. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp RANK(arg1,arg2) - Custom UDF SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) e Friday, July 1, 2011
  • 142. Analytic Aggregation Function MIN Friday, July 1, 2011
  • 143. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp Friday, July 1, 2011
  • 144. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept Friday, July 1, 2011
  • 145. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp Aggregation + JOIN SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept Friday, July 1, 2011
  • 147. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1 Friday, July 1, 2011
  • 148. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 educ e #3 M a pR join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1 Friday, July 1, 2011
  • 149. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 educ e #3 M a pR join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 duc e #2 Ma pRe join b on a.v1 = b.v1 Friday, July 1, 2011
  • 150. Merge Join Tree Bug Fix • SemanticAnalyzer private void mergeJoinTree(QB qb) { QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc()); if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); } } else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); } Friday, July 1, 2011
  • 151. Merge Join Tree Bug Fix • SemanticAnalyzer private void mergeJoinTree(QB qb) { QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc()); if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); } } else { } else { if parent = parent.getJoinSrc(); (merged) { root = parent.getJoinSrc(); root = qb.getQbJoinTree(); } } else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); } } Friday, July 1, 2011
  • 152. New HQL Syntax Friday, July 1, 2011
  • 153. New HQL Syntax INSERT INTO Friday, July 1, 2011
  • 154. New HQL Syntax INSERT INTO INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ... Friday, July 1, 2011
  • 155. New HQL Syntax INSERT INTO INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ... • INSERT [OVERWRITE] destination • grammar • modify FileSinkPlan • New Feature - HIVE-306 • INSERT INTO destination Friday, July 1, 2011
  • 157. Tuning • Hadoop Tunning Friday, July 1, 2011
  • 158. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task Friday, July 1, 2011
  • 159. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts Friday, July 1, 2011
  • 160. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size Friday, July 1, 2011
  • 161. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size Friday, July 1, 2011
  • 162. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning Friday, July 1, 2011
  • 163. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning • hive.input.format = CombineHiveInputFormat Friday, July 1, 2011
  • 164. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning • hive.input.format = CombineHiveInputFormat • query tuning - reduce # of MapReduce using HQL Plan Friday, July 1, 2011
  • 165. Wrap-Up Oracle 2 Hive Friday, July 1, 2011
  • 166. Wrap-Up Oracle 2 Hive Look insight data flow & model Friday, July 1, 2011
  • 167. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Friday, July 1, 2011
  • 168. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Friday, July 1, 2011
  • 169. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Friday, July 1, 2011
  • 170. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function Friday, July 1, 2011
  • 171. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf Friday, July 1, 2011
  • 172. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Friday, July 1, 2011
  • 173. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hive Friday, July 1, 2011
  • 174. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hive Hadoop + Hive Tunning Friday, July 1, 2011