SlideShare a Scribd company logo
Informatics Ambareesh Kulkarni
Informatics defined Informatics is the application of technology to bring Data, People and Systems together Bioinformatics is very  Complex  representation of  Simple  data Cheminformatics is very  Simple  representation of  Complex  data
Current State
Problem Statement…. “ There's too much data and it's duplicated hundreds of times. The mistake companies make is that they start from the data they have. They need to ask what data do their users need and what are the questions they are asking. Understand the questions, how they can be answered and what kind of data is needed.” Quote by CIO of Major Corporation
Integrated Solutions - Business Case: IDC White Paper Information Tasks Email – 14.5 hours a week Create documents – 13.3 hours a week Search – 9.5 hours a week Gather information for documents – 8.3 hours a week Find and organize documents – 6.8 hours a week Gartner: “Organizations spend an estimated $750 Billion annually seeking information necessary to do their job.”
Time Wasted (per year) Reformat information - $57 million per 10,000 users Not finding information - $53 million per 10,000 users Recreating content - $45 Million per 10,000 users Data Integration- Business Case: IDC White Paper
Reduce development costs, cycle times Increase employee efficiency Less time looking, more time doing Enhance communication Capture and reuse knowledge Innovate better & faster Cost of not finding right information Business – lost money, opportunities Data Integration - Business Case:  General ROI issues IDC White Paper
Key Takeaways Data Integration is not easy and represents ~80% of effort for a typical data integration project. Incompatible data are the largest, most expensive, and time-consuming portion of IT projects. Most data is in an unstructured format (outlook, word, PDF, images etc.)
Evolution of data integration technologies
Evolution of Integration Architectures Point to Point HUB + Spoke HUB + EII
Defining EII, EAI, ETL Data Integration EII EAI Enterprise Information Integration Enterprise Application Integration Reports  from multiple apps/data sources Transactions  to multiple apps e.g. Real-time access to product silos for customers, employees e.g. Compound name change in one application propagated to other products  EII ETL Real-time Batch Extract, Transform, Report in real-time Extract, Transform, Load; later report on data warehouse e.g. report data from operational applications e.g. build duplicate reporting data mart and/or redesign data warehouse
Tools vs. Development Platform Enterprise Application Requirements Tools Development Platform
What do end users really care about? The Internet has raised the bar for Informatics expectations Complex Query? Millions of Rows? Full table Scan? Users don’t really care. If they can view stock prices in real time, why not corporate data. In an ideal world, data analysis needs to be at speed of thought. Bigger, better, faster, cheaper
Business users view Data Pipeline Pilot Reports
IT perspective
Key Takeaways Provide an Integrated view of data across multiple systems; flat files, data warehouses , data marts. Avoid “boiling the ocean” Jump start data integration efforts with PP to quickly meet an important user requirement and then decide if the data should be persisted in a data warehouse or data mart. Use Pipeline Pilot to:
Action from Insight Data is a New form of Energy
Why is data integration so important? Data in any organization is distributed in various disconnected and disparate systems There is always a need to combine most current data with historical values The success of the internet has created data sources outside the internal network Data has  informational value  only when combined with other & related data
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements Presentations or discussions that are prefaced with statements like “most of our analysis would have been accurate, except for the missing data from….” or “ Due to discovery of data not included in the last analysis , we are reversing our decision to……”
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements As a result of an out-of-order condition for a critical chemical, a scientist must expedite the order and pay a premium price. When the chemical arrives the scientist (or worse her boss) discovers that another division had excess quantity of the same chemical and was looking to sell it at a discount.
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements Scientists argue about the fact that analysis results differ-even though the data came from the same operational data source
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements A technician alerts his management team of scientists to a potential problem discovered while running a query against a database. The technician cannot, however, answer the follow-up question , ” How long has the problem existed?”
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements A  Scientist runs a report every week against a LIMS, however to  see a period-to-period comparison, the scientist maintains a spreadsheet into which he creates a new column every week and enters the data manually
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements A customer calls tech. support to enquire about a pending case. While the customer support engineer has access to the case details, has no information available on whether the customer is current on maintenance, how many end-users they are licensed for or what options the customer has purchased.
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements Minor change-requests take weeks to be implemented, any modifications have to be thoroughly tested for accuracy and integrity,
WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements CEO and CFO are uncomfortable signing off on the quarterly numbers as there is no way to trace the numbers back to the source systems.
Case Study (closer to home): Services Order Report Poor data quality Redundant information Duplicate entries Hard to read Huge amount of time required to clean it up
Information-sensitivity Data Availability and Accessibility Data Quality DQ = Completeness X Validity E.g. Measure of Completeness = # of null values in a column E.g. Measure of Validity = “ We have 4 regions, but there are 18 distinct values in the region column” Pitfall: Don’t take accountability for DQ on the source system Push accountability where it belongs, in the source system(s) Timeliness of Data, relevant to the questions being asked by the user SQL and programming accuracy Information Quality is a Direct Function of……
Case Study (closer to home): Internal Revenue Forecasting process Orders QTD                                                                                Pipeline                                                                                      Delivered  Forecast     Run the Services Products and  Orders report in RSVPP  ……; Export out the results and filter for product services (Column AM) and sum the Total Sale Price USD column Run the Services Opportunities report in  SFDC ; export out the result…… Assuming Access  is up to date………..; export to Excel; filter by product services and sum USD Amount column Assuming Access is up to date run the Total Forecast report;  Export to Access ; …………
Near real-time data access
Extract, Transformation & Load=Push big data Batch  extract from transaction systems Bulk  transformation Push  load into data warehouse Extract Load Transformation Data Warehouse Real Time
Pipeline Pilot and Real time Data access Data Access  Data Adapters   Data Transformation  Transform  Calculate  Security Relational Flat Files ERP Legacy EJB XML <XML> Information Access Web Services  ODBC  JDBC Flexible Data Access capabilities Single access point to data Consumer sees only the end result Shared platform service Available to all technologies Reusable building blocks Targeted to specific needs Reduces costs and time to market Supports incremental development
Case Study: PI Historian PI Historian, product provided by OSI, captures data real-time from the research test rigs Data capture in PI is triggered by events PP allows scientists to read the data from PI historian as it becomes available and also combine it with other information (e.g. associate real-time test data with historical characteristics of a catalyst
Data provisioning pros and cons
Data Integration Total Cost of Ownership Really  Matters
Evolution Of an Informatics System 1 “ Just give me a list of compounds from the database, sorted by compound name”
Evolution Of an Informatics System “ We also need to see the related toxicology information and for the list to be grouped by compound” 1 2
Evolution Of an Informatics System “ We’d like to get a list of some of the related compound information, too, grouped by the first letter of the compounds name.” 1 2 3
Evolution Of an Informatics System “ Actually, we’d like to be able to produce a completely separate report for compound and related toxicology information .” 1 2 3 4
Evolution Of an Informatics System “ We don’t like running the reports manually. Can they be scheduled?” 1 2 3 4 5
Evolution Of an Informatics System “ We have quite a few users using this system now and there’s some fairly sensitive data in there.” 1 2 3 5 6 4
Evolution Of an Informatics System “ We need to be able to drill down into more detail” 7 1 2 3 5 6 4
Evolution Of an Informatics System 7 8 1 2 3 5 6 “ We need to track which users have used what Protocols” 4
Evolution Of an Informatics System “ We need to be able to easily search the information we need.” 9 6 8 4 7 1 2 3 5
Evolution Of an Informatics System 9 6 8 4 7 1 2 3 5 “ We need these reports linked to our business process” “ We need to be able to approve or reject the reports” “ We need a single version of the truth” “ We don’t want to be waiting around for the results” “ We don’t want to be re-typing information from these reports into our other application” “ We need to be able to see the underlying detail” “ We need to print the reports out to take into meetings” “ We need the output as Excel” “ We need charts” “ We need to know who’s looked at the reports” “ We need a simple way to see the entire contents of the report” “ We need a report that looks like an existing flow chart”
Hidden Costs Organizations that believe that they can build a data integration solution at the fraction of cost of a COTS solution…. Discover that any savings in up-front costs are very quickly incurred multiple times over the lifetime of the solution Typical effort to build a custom data integration solution can be upwards of 5000-5500 man days Some of  the tasks that need to be undertaken to provide a functioning solution: Application Architecture Data cleansing & enrichment services Integration framework User Interface design Common field matching Security Batch processing capabilities Application Integration Audit & Logging capabilities
Build versus Buy Decision Criteria Data Integration Considerations Build your own Buy Initial Start-up cost Lower Higher Ongoing Operating cost Higher Lower Ongoing Support & Maintenance In-house responsibility Vendor One time “quick and dirty” task Consider Maybe overkill unless one-time task becomes ongoing request IT Staff requirements Higher Lower IT Productivity Detracts from Contributes to Data sources/data targets Single/single Multiple/multiple, Multiple/single, Single/multiple Complex transformations Limited: IT must write complex code Comprehensive Integration Usually overlooked Industry standards
Industry Trends End-user Informatics
Web 2.0 What’s Setting Expectations  Today
Next-Generation Enabling Technologies &  New User Demands Are Emerging Rich Internet Experience Web 2.0  Portlet components XML and derivatives Dynamic, Ajax-based UI SOA Infrastructure Leverage existing systems and components Standardization Data-driven environment Open APIs to  customize apps Personal Dashboards  Integrate data from multiple sources Multi-account views Cross-account planning
Web 2.0 features on our projects
Web 2.0 features on our projects
Advanced Reporting/Visualization Collection
Scientific  Business Process Management and PP Fuse scientific and analytical data with process data Use Pipeline Pilot in automated process decisions  Display reports and data at appropriate points in the process Use data to modify process execution
Consolidated Informatics Platform Consolidated Informatics Platform Many Databases Many Tools Dashboards   Current Future Many Databases Spreadsheets Analytics Scorecards Self- service Reports Data Mining Portals Web Reports Web Reports
Key Takeaways Provide Accurate, Integrated & Seamless Informatics Solutions Reduce redundant and replicated data bases Rationalize existing Reporting tools and technologies Build Agile, Flexible and Reusable solutions Empower the end-users  “ Shift Right”
Shift Right

More Related Content

PDF
Setting Up the Data Lake
PDF
The path to a Modern Data Architecture in Financial Services
PDF
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
PDF
Testing the Data Warehouse―Big Data, Big Problems
PDF
You're the New CDO, Now What?
PDF
Contexti / Oracle - Big Data : From Pilot to Production
PPTX
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
PDF
Testing the Data Warehouse―Big Data, Big Problems
Setting Up the Data Lake
The path to a Modern Data Architecture in Financial Services
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Testing the Data Warehouse―Big Data, Big Problems
You're the New CDO, Now What?
Contexti / Oracle - Big Data : From Pilot to Production
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Testing the Data Warehouse―Big Data, Big Problems

What's hot (20)

PPTX
Predictive Analytics - Big Data Warehousing Meetup
PPTX
Software engineering practices for the data science and machine learning life...
PPTX
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
PDF
Making Big Data Easy for Everyone
PDF
Constant Contact: An Online Marketing Leader’s Data Lake Journey
PPTX
Harnessing the Power of Big Data at Freddie Mac
PPTX
Extreme Analytics @ eBay
PPTX
The Power of your Data Achieved - Next Gen Modernization
PDF
Optimizing the
 Data Supply Chain
 for Data Science
PDF
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
DOC
Testing data warehouse applications by Kirti Bhushan
PPTX
Big Data Maturity Scorecard
PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
PDF
Hadoop and Data Virtualization - A Case Study by VHA
PPTX
Who changed my data? Need for data governance and provenance in a streaming w...
PDF
Hl7 Analytics for IT and Clinical Insights
PDF
Data Discoverability at SpotHero
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
PDF
The lean principles of data ops
PPTX
How Universities Use Big Data to Transform Education
Predictive Analytics - Big Data Warehousing Meetup
Software engineering practices for the data science and machine learning life...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Making Big Data Easy for Everyone
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Harnessing the Power of Big Data at Freddie Mac
Extreme Analytics @ eBay
The Power of your Data Achieved - Next Gen Modernization
Optimizing the
 Data Supply Chain
 for Data Science
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
Testing data warehouse applications by Kirti Bhushan
Big Data Maturity Scorecard
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop and Data Virtualization - A Case Study by VHA
Who changed my data? Need for data governance and provenance in a streaming w...
Hl7 Analytics for IT and Clinical Insights
Data Discoverability at SpotHero
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
The lean principles of data ops
How Universities Use Big Data to Transform Education
Ad

Similar to End User Informatics (20)

PDF
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
PPTX
Top 10 Data Integration Challenges & How to Fix Them
PPTX
Data Integration Challenges and Solutions.pptx
PDF
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
PDF
Competitive advantage through data management terry jabali v.01
PDF
20100430 introduction to business objects data services
PDF
Data analytics and Access Program Recommendations
PDF
Bringing Agility and Flexibility to Data Design and Integration
ODP
Data quality overview
PDF
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
DOC
Comprehensive Data Governance Program
PDF
A Better Understanding: Solving Business Challenges with Data
PDF
Performance Management With Rational Insight - Karthi D
PDF
DAMA Big Data & The Cloud 2012-01-19
PPTX
Data quality and data profiling
PDF
All Together Now: A Recipe for Successful Data Governance
PPTX
IDERA Live | Business Value Metrics for Data Governance
PDF
Tips --Break Down the Barriers to Better Data Analytics
PDF
Enterprise 365 - SoftServe presentation
PPTX
Hints & Tips For Foundational Data For Your CMMS
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
Top 10 Data Integration Challenges & How to Fix Them
Data Integration Challenges and Solutions.pptx
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Competitive advantage through data management terry jabali v.01
20100430 introduction to business objects data services
Data analytics and Access Program Recommendations
Bringing Agility and Flexibility to Data Design and Integration
Data quality overview
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
Comprehensive Data Governance Program
A Better Understanding: Solving Business Challenges with Data
Performance Management With Rational Insight - Karthi D
DAMA Big Data & The Cloud 2012-01-19
Data quality and data profiling
All Together Now: A Recipe for Successful Data Governance
IDERA Live | Business Value Metrics for Data Governance
Tips --Break Down the Barriers to Better Data Analytics
Enterprise 365 - SoftServe presentation
Hints & Tips For Foundational Data For Your CMMS
Ad

More from Ambareesh Kulkarni (20)

PPT
Travel Management Dashboard application
PDF
Carlson Wagonlit: Award winning application
PDF
Analyze Optimize Realize - Business Value Analysis
PPTX
Evolution of Client Services functions
PPTX
Building the Digital Bank
PPT
Packaged Dashboard Reporting Solution
PDF
Actuate Certified Business Solutions for SAP
PDF
Professional Services Project Delivery Methodology
PPTX
Windows 10 Migration
PPT
Actuate BI implementation for MassMutual's SAP BW
PDF
Professional Services packaged solutions for SAP
PPT
SAP R3 SQL Query Builder
PPTX
Zero Touch Operating Systems Deployment
PPTX
Ambareesh Kulkarni, Professional background
PPTX
Professional Services Roadmap 2011 and beyond
PPTX
1E and Servicenow integration
PPTX
Enterprise BI & SOA
PPTX
Professional Services Automation
PPT
Storage Provisioning for Enterprise Information Applications
PPTX
Professional Services Sales Techniques & Methodology
Travel Management Dashboard application
Carlson Wagonlit: Award winning application
Analyze Optimize Realize - Business Value Analysis
Evolution of Client Services functions
Building the Digital Bank
Packaged Dashboard Reporting Solution
Actuate Certified Business Solutions for SAP
Professional Services Project Delivery Methodology
Windows 10 Migration
Actuate BI implementation for MassMutual's SAP BW
Professional Services packaged solutions for SAP
SAP R3 SQL Query Builder
Zero Touch Operating Systems Deployment
Ambareesh Kulkarni, Professional background
Professional Services Roadmap 2011 and beyond
1E and Servicenow integration
Enterprise BI & SOA
Professional Services Automation
Storage Provisioning for Enterprise Information Applications
Professional Services Sales Techniques & Methodology

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

End User Informatics

  • 2. Informatics defined Informatics is the application of technology to bring Data, People and Systems together Bioinformatics is very Complex representation of Simple data Cheminformatics is very Simple representation of Complex data
  • 4. Problem Statement…. “ There's too much data and it's duplicated hundreds of times. The mistake companies make is that they start from the data they have. They need to ask what data do their users need and what are the questions they are asking. Understand the questions, how they can be answered and what kind of data is needed.” Quote by CIO of Major Corporation
  • 5. Integrated Solutions - Business Case: IDC White Paper Information Tasks Email – 14.5 hours a week Create documents – 13.3 hours a week Search – 9.5 hours a week Gather information for documents – 8.3 hours a week Find and organize documents – 6.8 hours a week Gartner: “Organizations spend an estimated $750 Billion annually seeking information necessary to do their job.”
  • 6. Time Wasted (per year) Reformat information - $57 million per 10,000 users Not finding information - $53 million per 10,000 users Recreating content - $45 Million per 10,000 users Data Integration- Business Case: IDC White Paper
  • 7. Reduce development costs, cycle times Increase employee efficiency Less time looking, more time doing Enhance communication Capture and reuse knowledge Innovate better & faster Cost of not finding right information Business – lost money, opportunities Data Integration - Business Case: General ROI issues IDC White Paper
  • 8. Key Takeaways Data Integration is not easy and represents ~80% of effort for a typical data integration project. Incompatible data are the largest, most expensive, and time-consuming portion of IT projects. Most data is in an unstructured format (outlook, word, PDF, images etc.)
  • 9. Evolution of data integration technologies
  • 10. Evolution of Integration Architectures Point to Point HUB + Spoke HUB + EII
  • 11. Defining EII, EAI, ETL Data Integration EII EAI Enterprise Information Integration Enterprise Application Integration Reports from multiple apps/data sources Transactions to multiple apps e.g. Real-time access to product silos for customers, employees e.g. Compound name change in one application propagated to other products EII ETL Real-time Batch Extract, Transform, Report in real-time Extract, Transform, Load; later report on data warehouse e.g. report data from operational applications e.g. build duplicate reporting data mart and/or redesign data warehouse
  • 12. Tools vs. Development Platform Enterprise Application Requirements Tools Development Platform
  • 13. What do end users really care about? The Internet has raised the bar for Informatics expectations Complex Query? Millions of Rows? Full table Scan? Users don’t really care. If they can view stock prices in real time, why not corporate data. In an ideal world, data analysis needs to be at speed of thought. Bigger, better, faster, cheaper
  • 14. Business users view Data Pipeline Pilot Reports
  • 16. Key Takeaways Provide an Integrated view of data across multiple systems; flat files, data warehouses , data marts. Avoid “boiling the ocean” Jump start data integration efforts with PP to quickly meet an important user requirement and then decide if the data should be persisted in a data warehouse or data mart. Use Pipeline Pilot to:
  • 17. Action from Insight Data is a New form of Energy
  • 18. Why is data integration so important? Data in any organization is distributed in various disconnected and disparate systems There is always a need to combine most current data with historical values The success of the internet has created data sources outside the internal network Data has informational value only when combined with other & related data
  • 19. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements Presentations or discussions that are prefaced with statements like “most of our analysis would have been accurate, except for the missing data from….” or “ Due to discovery of data not included in the last analysis , we are reversing our decision to……”
  • 20. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements As a result of an out-of-order condition for a critical chemical, a scientist must expedite the order and pay a premium price. When the chemical arrives the scientist (or worse her boss) discovers that another division had excess quantity of the same chemical and was looking to sell it at a discount.
  • 21. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements Scientists argue about the fact that analysis results differ-even though the data came from the same operational data source
  • 22. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements A technician alerts his management team of scientists to a potential problem discovered while running a query against a database. The technician cannot, however, answer the follow-up question , ” How long has the problem existed?”
  • 23. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements A Scientist runs a report every week against a LIMS, however to see a period-to-period comparison, the scientist maintains a spreadsheet into which he creates a new column every week and enters the data manually
  • 24. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements A customer calls tech. support to enquire about a pending case. While the customer support engineer has access to the case details, has no information available on whether the customer is current on maintenance, how many end-users they are licensed for or what options the customer has purchased.
  • 25. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements Minor change-requests take weeks to be implemented, any modifications have to be thoroughly tested for accuracy and integrity,
  • 26. WARNING SIGNS : Of Poor Data Integration Incomplete Data foundation Inability to consolidate data from multiple sources No single version of the truth Poor audit trail and data lineage Historical values not retained in a data warehouse or data mart Lack of integrated 360 deg view High cost of maintaining “one-time” in-house code Inability to comply with regulatory requirements CEO and CFO are uncomfortable signing off on the quarterly numbers as there is no way to trace the numbers back to the source systems.
  • 27. Case Study (closer to home): Services Order Report Poor data quality Redundant information Duplicate entries Hard to read Huge amount of time required to clean it up
  • 28. Information-sensitivity Data Availability and Accessibility Data Quality DQ = Completeness X Validity E.g. Measure of Completeness = # of null values in a column E.g. Measure of Validity = “ We have 4 regions, but there are 18 distinct values in the region column” Pitfall: Don’t take accountability for DQ on the source system Push accountability where it belongs, in the source system(s) Timeliness of Data, relevant to the questions being asked by the user SQL and programming accuracy Information Quality is a Direct Function of……
  • 29. Case Study (closer to home): Internal Revenue Forecasting process Orders QTD                                                                                Pipeline                                                                                     Delivered  Forecast     Run the Services Products and Orders report in RSVPP ……; Export out the results and filter for product services (Column AM) and sum the Total Sale Price USD column Run the Services Opportunities report in SFDC ; export out the result…… Assuming Access is up to date………..; export to Excel; filter by product services and sum USD Amount column Assuming Access is up to date run the Total Forecast report; Export to Access ; …………
  • 31. Extract, Transformation & Load=Push big data Batch extract from transaction systems Bulk transformation Push load into data warehouse Extract Load Transformation Data Warehouse Real Time
  • 32. Pipeline Pilot and Real time Data access Data Access Data Adapters Data Transformation Transform Calculate Security Relational Flat Files ERP Legacy EJB XML <XML> Information Access Web Services ODBC JDBC Flexible Data Access capabilities Single access point to data Consumer sees only the end result Shared platform service Available to all technologies Reusable building blocks Targeted to specific needs Reduces costs and time to market Supports incremental development
  • 33. Case Study: PI Historian PI Historian, product provided by OSI, captures data real-time from the research test rigs Data capture in PI is triggered by events PP allows scientists to read the data from PI historian as it becomes available and also combine it with other information (e.g. associate real-time test data with historical characteristics of a catalyst
  • 35. Data Integration Total Cost of Ownership Really Matters
  • 36. Evolution Of an Informatics System 1 “ Just give me a list of compounds from the database, sorted by compound name”
  • 37. Evolution Of an Informatics System “ We also need to see the related toxicology information and for the list to be grouped by compound” 1 2
  • 38. Evolution Of an Informatics System “ We’d like to get a list of some of the related compound information, too, grouped by the first letter of the compounds name.” 1 2 3
  • 39. Evolution Of an Informatics System “ Actually, we’d like to be able to produce a completely separate report for compound and related toxicology information .” 1 2 3 4
  • 40. Evolution Of an Informatics System “ We don’t like running the reports manually. Can they be scheduled?” 1 2 3 4 5
  • 41. Evolution Of an Informatics System “ We have quite a few users using this system now and there’s some fairly sensitive data in there.” 1 2 3 5 6 4
  • 42. Evolution Of an Informatics System “ We need to be able to drill down into more detail” 7 1 2 3 5 6 4
  • 43. Evolution Of an Informatics System 7 8 1 2 3 5 6 “ We need to track which users have used what Protocols” 4
  • 44. Evolution Of an Informatics System “ We need to be able to easily search the information we need.” 9 6 8 4 7 1 2 3 5
  • 45. Evolution Of an Informatics System 9 6 8 4 7 1 2 3 5 “ We need these reports linked to our business process” “ We need to be able to approve or reject the reports” “ We need a single version of the truth” “ We don’t want to be waiting around for the results” “ We don’t want to be re-typing information from these reports into our other application” “ We need to be able to see the underlying detail” “ We need to print the reports out to take into meetings” “ We need the output as Excel” “ We need charts” “ We need to know who’s looked at the reports” “ We need a simple way to see the entire contents of the report” “ We need a report that looks like an existing flow chart”
  • 46. Hidden Costs Organizations that believe that they can build a data integration solution at the fraction of cost of a COTS solution…. Discover that any savings in up-front costs are very quickly incurred multiple times over the lifetime of the solution Typical effort to build a custom data integration solution can be upwards of 5000-5500 man days Some of the tasks that need to be undertaken to provide a functioning solution: Application Architecture Data cleansing & enrichment services Integration framework User Interface design Common field matching Security Batch processing capabilities Application Integration Audit & Logging capabilities
  • 47. Build versus Buy Decision Criteria Data Integration Considerations Build your own Buy Initial Start-up cost Lower Higher Ongoing Operating cost Higher Lower Ongoing Support & Maintenance In-house responsibility Vendor One time “quick and dirty” task Consider Maybe overkill unless one-time task becomes ongoing request IT Staff requirements Higher Lower IT Productivity Detracts from Contributes to Data sources/data targets Single/single Multiple/multiple, Multiple/single, Single/multiple Complex transformations Limited: IT must write complex code Comprehensive Integration Usually overlooked Industry standards
  • 49. Web 2.0 What’s Setting Expectations Today
  • 50. Next-Generation Enabling Technologies & New User Demands Are Emerging Rich Internet Experience Web 2.0 Portlet components XML and derivatives Dynamic, Ajax-based UI SOA Infrastructure Leverage existing systems and components Standardization Data-driven environment Open APIs to customize apps Personal Dashboards Integrate data from multiple sources Multi-account views Cross-account planning
  • 51. Web 2.0 features on our projects
  • 52. Web 2.0 features on our projects
  • 54. Scientific Business Process Management and PP Fuse scientific and analytical data with process data Use Pipeline Pilot in automated process decisions Display reports and data at appropriate points in the process Use data to modify process execution
  • 55. Consolidated Informatics Platform Consolidated Informatics Platform Many Databases Many Tools Dashboards Current Future Many Databases Spreadsheets Analytics Scorecards Self- service Reports Data Mining Portals Web Reports Web Reports
  • 56. Key Takeaways Provide Accurate, Integrated & Seamless Informatics Solutions Reduce redundant and replicated data bases Rationalize existing Reporting tools and technologies Build Agile, Flexible and Reusable solutions Empower the end-users “ Shift Right”

Editor's Notes

  • #5: Most end users think their data requirements are unique, which is not the case. Within an organization there is a pattern to the data requests. Avoid creating data silos
  • #11: EAI technology was developed to enhance application level information exchange. An EAI message is hard-coded into an application’s logic and is efficient only at exchanging messages, carrying information and data from one application to another. EAI solutions have no means of optimizing queries where large datasets are involved. EII:it can present non-relational as if it were in relational format.
  • #14: Our objective and goal is to satisfy or come close to satisfying the ever growing and insatiable demand of end-users for information. Enterprise data continues to grow exponentially every year. In the new world of iPhones where users have the illusaion that they can touch and feel data, latency is extremely irritating. Utopia is a Zero Latency environment, with reports being made available at the snap of a finger. As we go further into the presentation we’ll highlight what reality is. Insert two slides, end user perspective of architecture vs.IT
  • #29: End users have a very low sensitivity to irrelevant and erroneous information on reports. We have often heard that reports adoption is low, actually it is the relevance and accuracy of information being presented that drives adoption. Report acceptance by end users is directly dependant on the following factors
  • #37: They start slow, not well done and usually irreversable
  • #50: RIA a cross between Web applications and traditional desktop applications, transferring some of the processing to the client end AJAX-enabled dynamic web functionality – sliding bars, live graphs, personalized rollover/hover content, etc.. The User Experience on Google, Ebay, Netflix and Yahoo are what users expec from a web experience.
  • #51: The IT community is seeing RIAs as successful models for creating lightweight front-ends for SOAs.
  • #55: Informatics/reporting is a process, that other BI vendors fail to recognize. To integrate data one must integrate processes