SlideShare a Scribd company logo
2
Most read
3
Most read
6
Most read
Micro Data Center
&
Hadoop Big Data WareHouse
V0.01, r 2018
Open Source
Platform
● Micro Data Center 25 TB, Small
Business Solution (Plug & Play)
● Hadoop Open Source
Technology
● Hive Data Warehouse
● Hadoop Testing Data model
● Software & Tools Library
● Business Intelligence report
Infrastructure V0.07
Micro Data
center
Full Redundant Feature : ISP
RV325Router ,RV340W
Wireless Router,SF200-24
Switch , NAS Server Xeon
Storage, NAS Server, UPS
Ubuntu
Linux
2 GHz dual core processor or
better , 2 GB system memory
25 GB of free hard drive
space , Network & Internet
access
Hadoop
HDS
Java Package,Hadoop 2.7.3
Package,bash file (.bashrc),
NameNode, DataNode,
ResourceManager,NodeManager.
Hive Data
Warehouse
Hive Metastore Warehouse:
JDBC/ODBC ,CLI, Hive Thrift
Server , Hive Web Interface,
Driver, MapReduce, H-QL,
Impala , Cloudera
Report
Business Intelligence, Excel,
Report Archiving, Backup &
Recovery, Cloud Storage
Micro Data Center (Hardware Specification)
Networking :
1. Cisco Small Business RV325 Router - 14-port - Gigabit Ethernet
2. Cisco Small Business RV340W Wireless Router - 2.4 GHz / 5 GHz
3. Cisco Small Business Smart SF200-24 Switch - 24 Ethernet Ports
Central Storage :
1. WD Sentinel DS5100 WDBYVE0080KBK Server Xeon - 15 TB
2. Seagate Personal Cloud STCR3000101 NAS Server - 5 TB
3. Seagate 5TB Backup Plus External Hard Drive - 5TB
Redundant Power UPS :
1. OL1000RTXL2U, Runtime @ 450 W: 20 min
Network Connectivity
Micro Data Center storage v0.07
& Linux Ubuntu Workstation v18.04 LTS
Connect to Micro Data Center Storage:
1. Connect Network/Wifi router
2. PersonalCloude : http://192.168.1.82/
Device user’s name & PW
3. Network Configuration for Micro Datacenter Storage
Linux Ubuntu (Workstation) to Micro Data Center:
1. Boot from USB/DVD
2. Prepare to install Ubuntu
3. Allocate drive space
4. Begin installation
5. Login as User Admin for Storage
Hadoop Installation v2.7.3
Install Hadoop
1. Download & Installation :
a. Linux Ubuntu
b. Java JDK c. Vim CLI (command line interface)
d.hadoop-2.6.5.tar.gz
2. Group & Admin for Hadoop User
3. Configuration : a. sysctl.conf (ipv6) b. Generating
public/private rsa key pair c. ssh localhost d..bashrc
(Hadoop Variables), e. Hadoop Core conf files
(hadoop-env.sh, core-site.xml,mapred-site.xml &
hdfs-site.xml) f. Namenode, Datanode & hadoop_store
g. Namenode format h. Start-all.sh jps
4. Hadoop Daemons JPS (ResorceManager,
SecondaryNamenode, NodeManager, & Datanode)
Hadoop Web Interface
● http://localhost:50070/ of the NameNode daemon :
Namenode Summary report,Security , Safemode status,
DFS Used%, DFS Remaining%, Block Pool Used,
DataNodes usages%, Live Nodes, Dead Nodes,
Decommissioning Nodes, Number of Under-Replicated
Blocks, NameNode Journal Status, Journal Manager,
NameNode Storage
● Datanode Information :
Node, Admin State, Capacity, Used, Non DFS Used,
Remaining, Block pool used, Failed Volumes.
● Browsing HDFS :
Browse Directory, Permission, Owner, Group, Size, Block
Size, Folder Name.
Hive
Data Warehouse Implementation v2.0
Step1. Hive Installation
• Download the Hive
• Configure ~/.bashrc and set the environment variables
Step2. Hive Warehouse Directory Creation
• Hive is based on Hadoop platform in Hadoop in PATH
• HDFS create the Hive Warehouse Directory
Step3. Hive Configuration
• Configure Hive with Hadoop
• Congigure “hive-env.sh” file
• Configure to external database to configure Metastore
Step4. Hive Data Warehouse Files Location
• $hadoop fs-ls /user/hive/warehouse
VirtualBox Installation v5.1.34r
R&D platform (Cloudera QuickStart)
● Download from Oracle Virtual Box.org
● Configure Network Interfaces
● Open VM virtualBox Manager
● Appliance to import
● Appliance Settings : Name, Guest OS Type, CPU 2, RAM
10GB, DVD, Network Adapter
● Enable Network :
● Adapter 1: Inter PRO/1000 MT Desktop (NAT)
● Adapter 2: Configure Host only Adapter ( VirtualBox Host-only
Ethernet Adapter 2)
● System : Motherboard, Base Memory :10GB ,Processor 2 CPU
● VirtualBox running : Booting CentOS 6
(2.6.32-573.e16.x86_64)
Tools & Software Library
WINSCP Ftp infterace between Window & Linux
● SSH and SPC code based on Putty
● Login : New Site > File protocol :FSTP, Host name IP :
192.168.56.101, Port number:22 and Username/PW.
●
● Upload & Download file : File Upload to Linux/Window OS and
File Download to Windows/Linux OS
PuTTY Key Generator :
●
● Public key for pasting into OpenSSH authorized File,
● Type of Key Parameters RSA, Save public key
PuTTY release 0.70 :
● Host Name IP : 3.17.0.143 & Port 22
●
●
ETL (extract, transform, load) ELT (extract, load, transform) : SQOOP
ClouderaCDH5.3.
Business Data Testing & Analysis 12k+ Customers
Hadoop, Hive & Impala (SQL) , Source Cloudera@quickstart
MySQL (retail_db) :
mysql> show databases;
mysql> use retail_db;
mysql> select count(*) from customers;
Sqoop :
[cloudera@quickstart ~]$ sqoop import-all-tables 
Hive (retail_db) :
hive> show databases;
hive> use default;
hive> show tables;
hive> select count(*) from customers;
Hive Data Warehouse :
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/
Business Intelligence Report :
Most popular product categories
Top 10 revenue generating products
MySQL (retail_db)
mysql> show tables;
Table Records
categories 58
customers 12,435
Departments 6
order_items 68,883
orders 1,72,198
products 1,345
BigData/Hive/Impala
hive> show tables;
Table Records
categories 58
customers 12,435
Departments 6
order_items 68,883
orders 1,72,198
products 1,345
AWS Cloud Services Ubuntu Server 18.04 LTS
AWS Management Console:
● Step 1: Amazon Machine Image
Ubuntu Server 18.04 LTS
● Step 2: Build an Instance
● Step 3: Configure Instance Details
● Step 4: Add Storage
● Step 5: Add Tags
● Step 6: Configure Security Group
● Step 7: Review Instance Launch
Connect AWS Management Console:
● Connect ubuntu@ip-172.47.106:~$
● Generate Private Key by PPuttygen
● Connect AWS from Putty FTP
● File Transfer by WinSCP SFTP
Business Intelligence Reporting Tools v2.65
Report Generation
● Opening BI window
● Run Power BI Desktop
● Import Data from different source
● Connecting Dataset
● Load Data into BI
● Management Data as per Query
● Export as BI / Export to PDF
Prototype DemoCloudera (Remote Login)
Connected over Cloud Through :
● Team Viewer
● Windows10
● VirualBox
● Start Cloudera Desktop
● Cloudera CLI Terminal
● Run Mysql database
● Run HIVE open source database
● Cloudera QuickStart Hive/Impala SQL terminal
● SQL Data Analysis
FTP (File Transfer Protocol)
● Run WinSCP
● Connect Window Desktop to Linux Desktop
Business Intelligence Report
● Visual Analytics at your fingertips and creating
interactive data visualizations and reports.
Meetup
Registration :
● Free Orientation
● Prototype demo
● Consultancy
Info & Registration :
Micro DataCenter & Data Warehouse
MDCDWH@gmail.com
https://goo.gl/forms/SuCTolEeZNNIL35V2

More Related Content

PPTX
openSUSE storage workshop 2016
PPTX
Build an affordable Cloud Stroage
PPTX
Performance analysis with_ceph
PPTX
SUSE Enterprise Storage on ThunderX
PPTX
Hadoop Cluster - Basic OS Setup Insights
PPTX
Ceph Day Bring Ceph To Enterprise
DOCX
PDF
All Zones
openSUSE storage workshop 2016
Build an affordable Cloud Stroage
Performance analysis with_ceph
SUSE Enterprise Storage on ThunderX
Hadoop Cluster - Basic OS Setup Insights
Ceph Day Bring Ceph To Enterprise
All Zones

What's hot (19)

PDF
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
PDF
Passwordless login with unix auth_socket
PPTX
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
PPTX
Ceph Performance and Sizing Guide
PDF
Squid proxy-configuration-guide
PDF
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
PDF
PPTX
Introduction to TrioNAS LX U300
PDF
Less passwords, more security: unix socket authentication and other MariaDB h...
DOCX
Project on squid proxy in rhel 6
PDF
Ceph Day Beijing - SPDK for Ceph
PDF
J Ruby On Rails Presentation
PDF
OpenStack DRaaS - Freezer - 101
PDF
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
PDF
StackiFest16: What's Next in Stacki - Mason Katz
PDF
OSDC 2014: Nat Morris - Open Network Install Environment
PPT
Squid Server
PPTX
Hadoop single cluster installation
PPTX
Bluestore
PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options
Passwordless login with unix auth_socket
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
Ceph Performance and Sizing Guide
Squid proxy-configuration-guide
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
Introduction to TrioNAS LX U300
Less passwords, more security: unix socket authentication and other MariaDB h...
Project on squid proxy in rhel 6
Ceph Day Beijing - SPDK for Ceph
J Ruby On Rails Presentation
OpenStack DRaaS - Freezer - 101
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
StackiFest16: What's Next in Stacki - Mason Katz
OSDC 2014: Nat Morris - Open Network Install Environment
Squid Server
Hadoop single cluster installation
Bluestore
Ad

Similar to Micro Datacenter & Data Warehouse (20)

PPT
Big data with hadoop Setup on Ubuntu 12.04
PDF
Cloudera hadoop installation
PDF
Introduction to Stacki - World's fastest Linux server provisioning Tool
PPTX
Cloud init and cloud provisioning [openstack summit vancouver]
PDF
Linux advanced concepts - Part 2
PDF
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
PDF
Bare Metal to OpenStack with Razor and Chef
PDF
Automation day red hat ansible
PDF
What_s_New_in_OpenShift_Container_Platform_4.6.pdf
PDF
Install Oracle 12c Golden Gate On Oracle Linux
PDF
Linux sever building
PDF
Automação do físico ao NetSecDevOps
PDF
Minimal OpenStack LinuxCon NA 2015
PPTX
Mysql 8 vs Mariadb 10.4 Highload++ 2019
PPTX
Open stackbrief happylearning
PDF
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
PPT
Install and configure linux
PDF
Building Hopsworks, a cloud-native managed feature store for machine learning
PDF
Improving Operations Efficiency with Puppet
PDF
Mastering VMware Datacenter Part-1
Big data with hadoop Setup on Ubuntu 12.04
Cloudera hadoop installation
Introduction to Stacki - World's fastest Linux server provisioning Tool
Cloud init and cloud provisioning [openstack summit vancouver]
Linux advanced concepts - Part 2
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
Bare Metal to OpenStack with Razor and Chef
Automation day red hat ansible
What_s_New_in_OpenShift_Container_Platform_4.6.pdf
Install Oracle 12c Golden Gate On Oracle Linux
Linux sever building
Automação do físico ao NetSecDevOps
Minimal OpenStack LinuxCon NA 2015
Mysql 8 vs Mariadb 10.4 Highload++ 2019
Open stackbrief happylearning
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
Install and configure linux
Building Hopsworks, a cloud-native managed feature store for machine learning
Improving Operations Efficiency with Puppet
Mastering VMware Datacenter Part-1
Ad

Recently uploaded (20)

PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
project resource management chapter-09.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Architecture types and enterprise applications.pdf
PPTX
Modernising the Digital Integration Hub
PDF
Getting Started with Data Integration: FME Form 101
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Hybrid model detection and classification of lung cancer
A contest of sentiment analysis: k-nearest neighbor versus neural network
Group 1 Presentation -Planning and Decision Making .pptx
project resource management chapter-09.pdf
Programs and apps: productivity, graphics, security and other tools
TLE Review Electricity (Electricity).pptx
Developing a website for English-speaking practice to English as a foreign la...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Assigned Numbers - 2025 - Bluetooth® Document
A novel scalable deep ensemble learning framework for big data classification...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
O2C Customer Invoices to Receipt V15A.pptx
Architecture types and enterprise applications.pdf
Modernising the Digital Integration Hub
Getting Started with Data Integration: FME Form 101
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Hybrid model detection and classification of lung cancer

Micro Datacenter & Data Warehouse

  • 1. Micro Data Center & Hadoop Big Data WareHouse V0.01, r 2018
  • 2. Open Source Platform ● Micro Data Center 25 TB, Small Business Solution (Plug & Play) ● Hadoop Open Source Technology ● Hive Data Warehouse ● Hadoop Testing Data model ● Software & Tools Library ● Business Intelligence report
  • 4. Micro Data center Full Redundant Feature : ISP RV325Router ,RV340W Wireless Router,SF200-24 Switch , NAS Server Xeon Storage, NAS Server, UPS Ubuntu Linux 2 GHz dual core processor or better , 2 GB system memory 25 GB of free hard drive space , Network & Internet access Hadoop HDS Java Package,Hadoop 2.7.3 Package,bash file (.bashrc), NameNode, DataNode, ResourceManager,NodeManager. Hive Data Warehouse Hive Metastore Warehouse: JDBC/ODBC ,CLI, Hive Thrift Server , Hive Web Interface, Driver, MapReduce, H-QL, Impala , Cloudera Report Business Intelligence, Excel, Report Archiving, Backup & Recovery, Cloud Storage
  • 5. Micro Data Center (Hardware Specification) Networking : 1. Cisco Small Business RV325 Router - 14-port - Gigabit Ethernet 2. Cisco Small Business RV340W Wireless Router - 2.4 GHz / 5 GHz 3. Cisco Small Business Smart SF200-24 Switch - 24 Ethernet Ports Central Storage : 1. WD Sentinel DS5100 WDBYVE0080KBK Server Xeon - 15 TB 2. Seagate Personal Cloud STCR3000101 NAS Server - 5 TB 3. Seagate 5TB Backup Plus External Hard Drive - 5TB Redundant Power UPS : 1. OL1000RTXL2U, Runtime @ 450 W: 20 min
  • 6. Network Connectivity Micro Data Center storage v0.07 & Linux Ubuntu Workstation v18.04 LTS Connect to Micro Data Center Storage: 1. Connect Network/Wifi router 2. PersonalCloude : http://192.168.1.82/ Device user’s name & PW 3. Network Configuration for Micro Datacenter Storage Linux Ubuntu (Workstation) to Micro Data Center: 1. Boot from USB/DVD 2. Prepare to install Ubuntu 3. Allocate drive space 4. Begin installation 5. Login as User Admin for Storage
  • 7. Hadoop Installation v2.7.3 Install Hadoop 1. Download & Installation : a. Linux Ubuntu b. Java JDK c. Vim CLI (command line interface) d.hadoop-2.6.5.tar.gz 2. Group & Admin for Hadoop User 3. Configuration : a. sysctl.conf (ipv6) b. Generating public/private rsa key pair c. ssh localhost d..bashrc (Hadoop Variables), e. Hadoop Core conf files (hadoop-env.sh, core-site.xml,mapred-site.xml & hdfs-site.xml) f. Namenode, Datanode & hadoop_store g. Namenode format h. Start-all.sh jps 4. Hadoop Daemons JPS (ResorceManager, SecondaryNamenode, NodeManager, & Datanode)
  • 8. Hadoop Web Interface ● http://localhost:50070/ of the NameNode daemon : Namenode Summary report,Security , Safemode status, DFS Used%, DFS Remaining%, Block Pool Used, DataNodes usages%, Live Nodes, Dead Nodes, Decommissioning Nodes, Number of Under-Replicated Blocks, NameNode Journal Status, Journal Manager, NameNode Storage ● Datanode Information : Node, Admin State, Capacity, Used, Non DFS Used, Remaining, Block pool used, Failed Volumes. ● Browsing HDFS : Browse Directory, Permission, Owner, Group, Size, Block Size, Folder Name.
  • 9. Hive Data Warehouse Implementation v2.0 Step1. Hive Installation • Download the Hive • Configure ~/.bashrc and set the environment variables Step2. Hive Warehouse Directory Creation • Hive is based on Hadoop platform in Hadoop in PATH • HDFS create the Hive Warehouse Directory Step3. Hive Configuration • Configure Hive with Hadoop • Congigure “hive-env.sh” file • Configure to external database to configure Metastore Step4. Hive Data Warehouse Files Location • $hadoop fs-ls /user/hive/warehouse
  • 10. VirtualBox Installation v5.1.34r R&D platform (Cloudera QuickStart) ● Download from Oracle Virtual Box.org ● Configure Network Interfaces ● Open VM virtualBox Manager ● Appliance to import ● Appliance Settings : Name, Guest OS Type, CPU 2, RAM 10GB, DVD, Network Adapter ● Enable Network : ● Adapter 1: Inter PRO/1000 MT Desktop (NAT) ● Adapter 2: Configure Host only Adapter ( VirtualBox Host-only Ethernet Adapter 2) ● System : Motherboard, Base Memory :10GB ,Processor 2 CPU ● VirtualBox running : Booting CentOS 6 (2.6.32-573.e16.x86_64)
  • 11. Tools & Software Library WINSCP Ftp infterace between Window & Linux ● SSH and SPC code based on Putty ● Login : New Site > File protocol :FSTP, Host name IP : 192.168.56.101, Port number:22 and Username/PW. ● ● Upload & Download file : File Upload to Linux/Window OS and File Download to Windows/Linux OS PuTTY Key Generator : ● ● Public key for pasting into OpenSSH authorized File, ● Type of Key Parameters RSA, Save public key PuTTY release 0.70 : ● Host Name IP : 3.17.0.143 & Port 22 ● ● ETL (extract, transform, load) ELT (extract, load, transform) : SQOOP
  • 12. ClouderaCDH5.3. Business Data Testing & Analysis 12k+ Customers Hadoop, Hive & Impala (SQL) , Source Cloudera@quickstart MySQL (retail_db) : mysql> show databases; mysql> use retail_db; mysql> select count(*) from customers; Sqoop : [cloudera@quickstart ~]$ sqoop import-all-tables Hive (retail_db) : hive> show databases; hive> use default; hive> show tables; hive> select count(*) from customers; Hive Data Warehouse : [cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/ Business Intelligence Report : Most popular product categories Top 10 revenue generating products MySQL (retail_db) mysql> show tables; Table Records categories 58 customers 12,435 Departments 6 order_items 68,883 orders 1,72,198 products 1,345 BigData/Hive/Impala hive> show tables; Table Records categories 58 customers 12,435 Departments 6 order_items 68,883 orders 1,72,198 products 1,345
  • 13. AWS Cloud Services Ubuntu Server 18.04 LTS AWS Management Console: ● Step 1: Amazon Machine Image Ubuntu Server 18.04 LTS ● Step 2: Build an Instance ● Step 3: Configure Instance Details ● Step 4: Add Storage ● Step 5: Add Tags ● Step 6: Configure Security Group ● Step 7: Review Instance Launch Connect AWS Management Console: ● Connect ubuntu@ip-172.47.106:~$ ● Generate Private Key by PPuttygen ● Connect AWS from Putty FTP ● File Transfer by WinSCP SFTP
  • 14. Business Intelligence Reporting Tools v2.65 Report Generation ● Opening BI window ● Run Power BI Desktop ● Import Data from different source ● Connecting Dataset ● Load Data into BI ● Management Data as per Query ● Export as BI / Export to PDF
  • 15. Prototype DemoCloudera (Remote Login) Connected over Cloud Through : ● Team Viewer ● Windows10 ● VirualBox ● Start Cloudera Desktop ● Cloudera CLI Terminal ● Run Mysql database ● Run HIVE open source database ● Cloudera QuickStart Hive/Impala SQL terminal ● SQL Data Analysis FTP (File Transfer Protocol) ● Run WinSCP ● Connect Window Desktop to Linux Desktop Business Intelligence Report ● Visual Analytics at your fingertips and creating interactive data visualizations and reports.
  • 16. Meetup Registration : ● Free Orientation ● Prototype demo ● Consultancy Info & Registration : Micro DataCenter & Data Warehouse MDCDWH@gmail.com https://goo.gl/forms/SuCTolEeZNNIL35V2