SlideShare a Scribd company logo
Big Data Project Experience:
Industry: Manufacturing Project: Panera,LLC
Company: CenturyLink Technology, Noida,IN Duration: April 2016 – Present ( 7 Months)
Designation: Consultant Role:Big Data Developer
Project Description: Panera, LLC is American chain of bakery-café fask casual restaurants in
United States and Canada. CenturyLink have SOW with Panera, LLC for Capacity Planning and
Production setup. Client required Identification of methodology for tying the online business
work load at an order level to the actual utilization of the IT infrastructure and building of
sample/representative dashboard/s depicting measures defining the IT resource utilization per
order.
My responsibilities are to develop and test ETL jobs with Spark Scala (previously
python) to speed up parsing distributed unstructured data from different sources with Flume-
Kafka and to create regression data modeling like Random forest Gradient-Boosted Trees on
LIBSVM data files and to fix spark environment issues.
Responsibilities/Deliverables:
 Developed Spark ETL jobs to parse huge amount of unstructured data.
 Developed Spark MLLIB jobs to create regression data model on structured data
 Worked in IntelliJ Idea and SBT Build tool.
 Developing UI for visualization of reports in D3 JS and Zoomdata.
 Software Development and Automations for applications and system monitoring.
 Working on Cloudera distribution of Hadoop (CDH).
 Exposure to data manipulation with Hive queries
 Exposure to schedule jobs in Oozie.
 Exposure to create detailed document design of project.
 Secure data by using apache Sentry authorization.
Industry: Telecom Project: CTL-Cloudera Big Data As Service
Company: CenturyLink Technology, Noida, IN Duration: September 2016 to Present (2 Months)
Designation: Consultant Role: Big Data Developer
Project Description:
Press Report : j.mp/2cDr5nO
My responsibilities are to develop Automation API framework in Java/Python which will
setup and manage clusters with all services up and running automatically.
Responsibilities/Deliverables:
 Developed Automation API to deploy clusters on Cloudera manager Rest API.
 Developed Structured Cluster templates for automation.
 Software Development and Automations for applications and Ganglia system
monitoring.
Industry: Telecom Project: CTL Data Lake – PD
Company: CenturyLink Technology, Noida, IN Duration: April 2016 – June 2016 (3 Months)
Designation: Consultant Role:Big Data Developer
Project Description: CTL Data Lake is CenturyLink internal project for creating application
for comprehensive data access and management and then applies data analytics on scalable
data.
My responsibilities were to develop and test REST interface for data pipeline which take
data from customer and dumps to Kafka topic as well parse with Spark Streaming and stored to
HBase table and HDFS.
Responsibilities/Deliverables:
 Developed data pipeline with REST Java API which passes Kafka and HBase as
consumer.
 Developed flume integration with Kafka.
 Worked Eclipse Mars with Maven build tool.
 Developed Spark streaming API integrated with Kafka.
 Exposed to real time streaming jobs.
Industry: Telecom Project: AT&T Insights
Company: Amdocs, Gurgaon, IN Duration: November 2014 to March 2016 (1 Year 5 Months)
Designation: Software Engineer Role: Big Data Developer
Project Description:
AT&T is the second largest provider of mobile telephone services and the
largest provider of fixed telephone services in the United States and also
provides broadband subscription television services through DirecTV.
AT&T Insights is a module in CRM application in Amdocs.
My Responsibilities in Insight project: Data Ingestion to HBase from structured and
unstructured data source and development of Insights Spark API Development for reading data
from HBase Storage with Kafka producer for providing fast data access to multiple applications
like u-verse, CRM, testing application at same time.
Responsibilities/Deliverables:
 Developed Spark framework development in scala and HBase as storage.
 Developed flume integration with Kafka to loading unstructured data.
 Developed HiveQL for analysis on Huge Telecom data.
 Developed MapReduce jobs and UDFs with core-java.
 Automatic Data ingestion platform for migration of data from Oracle and H-Base.
 Worked in distributed Gigaspaces Grid clusters for Insight Application.
 Exposure with real time stream jobs and batch jobs.
 Software Development and Automations for applications and system monitoring.
 Developed modules for database connections and for structured programming.
 Experience with both Hortanworks and Cloudera distribution of Hadoop (CDH).
 Developed log analysis and real time monitoring tools for production application.
 Exposure to ETL jobs creation, flow diagrams, jobs scheduling in DAG.
 Visualization of reports to client using Tableau.
 Exposure to Ganglia, Kerberos, Hadoop metrics.

More Related Content

DOCX
Resume_Fresher
PDF
A view of graph data usage by Cerved
PDF
Spark Summit EU 2015: Matei Zaharia keynote
PDF
Achieving the digital thread through PLM and ALM integration using oslc
DOCX
Maharshi_Amin_416
PPTX
ALT-F1 Techtalk 3 - Google AppEngine
PDF
Robert Luong: Analyse prédictive dans Excel
PDF
Airbyte - Series-A deck
Resume_Fresher
A view of graph data usage by Cerved
Spark Summit EU 2015: Matei Zaharia keynote
Achieving the digital thread through PLM and ALM integration using oslc
Maharshi_Amin_416
ALT-F1 Techtalk 3 - Google AppEngine
Robert Luong: Analyse prédictive dans Excel
Airbyte - Series-A deck

What's hot (20)

PPTX
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
PPTX
Introduction To Pentaho
PPT
Graph Analytics for big data
PDF
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
PDF
Odp - On demand profiler (ICPE 2018)
PDF
Airbyte - Seed deck
PDF
MLOps Virtual Event: Automating ML at Scale
PPTX
Spark ML Pipeline serving
PDF
Multi runtime serving pipelines for machine learning
PDF
Deep Learning for Recommender Systems with Nick pentreath
PDF
Bridging the Gap Between Datasets and DataFrames
PDF
Ai platform at scale
PPTX
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
PDF
Detecting Mobile Malware with Apache Spark with David Pryce
DOCX
Big data cv
PPTX
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
PDF
Serverless machine learning operations
PPT
PoolParty - Metadata Management made easy
PDF
Newest mmis resume
PDF
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Introduction To Pentaho
Graph Analytics for big data
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Odp - On demand profiler (ICPE 2018)
Airbyte - Seed deck
MLOps Virtual Event: Automating ML at Scale
Spark ML Pipeline serving
Multi runtime serving pipelines for machine learning
Deep Learning for Recommender Systems with Nick pentreath
Bridging the Gap Between Datasets and DataFrames
Ai platform at scale
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Detecting Mobile Malware with Apache Spark with David Pryce
Big data cv
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Serverless machine learning operations
PoolParty - Metadata Management made easy
Newest mmis resume
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Ad

Viewers also liked (11)

DOCX
Manikyam_Hadoop_5+Years
PDF
SME Profile: Tourism Industries in Canada (March 2015)
PPTX
Periodo Prehispánico de Panamá
DOCX
DOC
Civil Engineer
DOCX
Consumer behaviour (1)
PDF
2013-2014 Parent Handbook
DOCX
康治本傷寒論
PPTX
Fisiologi asuhan kebidanan
PDF
Dubai Chamber - Strategic Platforms for Business Growth
PDF
Gas perfecto -- Estructura de la Materia y Ondas
Manikyam_Hadoop_5+Years
SME Profile: Tourism Industries in Canada (March 2015)
Periodo Prehispánico de Panamá
Civil Engineer
Consumer behaviour (1)
2013-2014 Parent Handbook
康治本傷寒論
Fisiologi asuhan kebidanan
Dubai Chamber - Strategic Platforms for Business Growth
Gas perfecto -- Estructura de la Materia y Ondas
Ad

Similar to Sudhanshu'sProjects (20)

DOCX
Supreet Resume
DOCX
Nagarjuna_Damarla_Resume
DOCX
Resume (1)
DOCX
Resume (1)
DOC
ChandanResume
PDF
Yasar resume 2
DOC
Nagarjuna_Damarla
PDF
Abhishek jaiswal
PDF
Spark and machine learning in microservices architecture
DOCX
Charles harper Resume
DOC
AnilKumarT_Resume_latest
DOC
Rajeev Tiwari Projects Xavient
DOC
Aamod_Chandra
DOCX
William-Timpany-2016-03-09-v4-Resume
DOC
Informatica_Rajesh-CV 28_03_16
DOC
Jeevan_Resume
DOCX
Updated SAKET MRINAL Resume
PDF
Mohit Kalra 25th August
DOCX
Mallikharjun_Vemana
Supreet Resume
Nagarjuna_Damarla_Resume
Resume (1)
Resume (1)
ChandanResume
Yasar resume 2
Nagarjuna_Damarla
Abhishek jaiswal
Spark and machine learning in microservices architecture
Charles harper Resume
AnilKumarT_Resume_latest
Rajeev Tiwari Projects Xavient
Aamod_Chandra
William-Timpany-2016-03-09-v4-Resume
Informatica_Rajesh-CV 28_03_16
Jeevan_Resume
Updated SAKET MRINAL Resume
Mohit Kalra 25th August
Mallikharjun_Vemana

Sudhanshu'sProjects

  • 1. Big Data Project Experience: Industry: Manufacturing Project: Panera,LLC Company: CenturyLink Technology, Noida,IN Duration: April 2016 – Present ( 7 Months) Designation: Consultant Role:Big Data Developer Project Description: Panera, LLC is American chain of bakery-café fask casual restaurants in United States and Canada. CenturyLink have SOW with Panera, LLC for Capacity Planning and Production setup. Client required Identification of methodology for tying the online business work load at an order level to the actual utilization of the IT infrastructure and building of sample/representative dashboard/s depicting measures defining the IT resource utilization per order. My responsibilities are to develop and test ETL jobs with Spark Scala (previously python) to speed up parsing distributed unstructured data from different sources with Flume- Kafka and to create regression data modeling like Random forest Gradient-Boosted Trees on LIBSVM data files and to fix spark environment issues. Responsibilities/Deliverables:  Developed Spark ETL jobs to parse huge amount of unstructured data.  Developed Spark MLLIB jobs to create regression data model on structured data  Worked in IntelliJ Idea and SBT Build tool.  Developing UI for visualization of reports in D3 JS and Zoomdata.  Software Development and Automations for applications and system monitoring.  Working on Cloudera distribution of Hadoop (CDH).  Exposure to data manipulation with Hive queries  Exposure to schedule jobs in Oozie.  Exposure to create detailed document design of project.  Secure data by using apache Sentry authorization. Industry: Telecom Project: CTL-Cloudera Big Data As Service Company: CenturyLink Technology, Noida, IN Duration: September 2016 to Present (2 Months) Designation: Consultant Role: Big Data Developer Project Description: Press Report : j.mp/2cDr5nO My responsibilities are to develop Automation API framework in Java/Python which will setup and manage clusters with all services up and running automatically. Responsibilities/Deliverables:  Developed Automation API to deploy clusters on Cloudera manager Rest API.  Developed Structured Cluster templates for automation.  Software Development and Automations for applications and Ganglia system monitoring.
  • 2. Industry: Telecom Project: CTL Data Lake – PD Company: CenturyLink Technology, Noida, IN Duration: April 2016 – June 2016 (3 Months) Designation: Consultant Role:Big Data Developer Project Description: CTL Data Lake is CenturyLink internal project for creating application for comprehensive data access and management and then applies data analytics on scalable data. My responsibilities were to develop and test REST interface for data pipeline which take data from customer and dumps to Kafka topic as well parse with Spark Streaming and stored to HBase table and HDFS. Responsibilities/Deliverables:  Developed data pipeline with REST Java API which passes Kafka and HBase as consumer.  Developed flume integration with Kafka.  Worked Eclipse Mars with Maven build tool.  Developed Spark streaming API integrated with Kafka.  Exposed to real time streaming jobs. Industry: Telecom Project: AT&T Insights Company: Amdocs, Gurgaon, IN Duration: November 2014 to March 2016 (1 Year 5 Months) Designation: Software Engineer Role: Big Data Developer Project Description: AT&T is the second largest provider of mobile telephone services and the largest provider of fixed telephone services in the United States and also provides broadband subscription television services through DirecTV. AT&T Insights is a module in CRM application in Amdocs. My Responsibilities in Insight project: Data Ingestion to HBase from structured and unstructured data source and development of Insights Spark API Development for reading data from HBase Storage with Kafka producer for providing fast data access to multiple applications like u-verse, CRM, testing application at same time. Responsibilities/Deliverables:  Developed Spark framework development in scala and HBase as storage.  Developed flume integration with Kafka to loading unstructured data.  Developed HiveQL for analysis on Huge Telecom data.  Developed MapReduce jobs and UDFs with core-java.  Automatic Data ingestion platform for migration of data from Oracle and H-Base.  Worked in distributed Gigaspaces Grid clusters for Insight Application.  Exposure with real time stream jobs and batch jobs.  Software Development and Automations for applications and system monitoring.  Developed modules for database connections and for structured programming.
  • 3.  Experience with both Hortanworks and Cloudera distribution of Hadoop (CDH).  Developed log analysis and real time monitoring tools for production application.  Exposure to ETL jobs creation, flow diagrams, jobs scheduling in DAG.  Visualization of reports to client using Tableau.  Exposure to Ganglia, Kerberos, Hadoop metrics.