SlideShare a Scribd company logo
Confidential & Proprietarywww.dclab.comwww.dclab.com
There’s Gold in Them Thar Data!
Leveraging the Value of Your Legacy Content
Greg Fagan,
Sales Director,
Data Conversion Laboratory (DCL)
Confidential & Proprietarywww.dclab.com 2
Valuable Content Transformed
• Document Digitization
• XML and HTML Conversion
• eBook Production
• Hosted Solutions
• Big Data Automation
• Conversion Management
• Editorial Services
• Harmonizer
Confidential & Proprietarywww.dclab.com 3
Experience the DCL Difference
DCL blends years of conversion experience with cutting-edge technology and the
infrastructure to make the process easy and efficient.
• World-Class Services
• Leading-Edge Technology
• Unparalleled Infrastructure
• US-Based Management
• Complex-Content Expertise
• 24/7 Online Project Tracking
• Automated Quality Control
• Global Capabilities
Confidential & Proprietarywww.dclab.com
We Serve a Very Broad Client Base . . .
4
Confidential & Proprietarywww.dclab.com 5
. . . Spanning All Industries
• Aerospace
• Associations
• Defense
• Distribution
• Education
• Financial
• Government
• Libraries
• Life Sciences
• Manufacturing
• Medical
• Museums
• Periodicals
• Professional
• Publishing
• Reference
• Research
• Societies
• Software
• STM
• Technology
• Telecommunications
• Universities
• Utilities
Confidential & Proprietarywww.dclab.com
• Companies focused on delivering new content
• Are they overlooking what they already have and missing
opportunities?
• The digital age has increased potential audience size and
demand for content
• Don’t ignore what you already have – it may be highly
valuable
6
What’s in Your Archives?
Confidential & Proprietarywww.dclab.com
• Paper
• Microfilm
• Photographs and/or slides
• Electronic files
• Some combination of the above
• Can you find it easily?
7
Which Format Is Your Data in?
Confidential & Proprietarywww.dclab.com
Tackling My Archival Content Is Scary!
8
Confidential & Proprietarywww.dclab.com
• Task seems too big
• Who takes ownership?
• Cost in dollars and staff resources
• Will take forever to complete
9
What’s Causing Your Fear and Anxiety?
Confidential & Proprietarywww.dclab.com
• Perception: Converting legacy data will be costly, labor-
intensive, and too complex to manage. The ROI is
questionable, thus it’s not worth the risk
• Reality: Proper analysis and planning, a customized process,
and the expertise of a trusted partner limits the risk and
ensures maximum ROI
10
Perception vs. Reality
Confidential & Proprietarywww.dclab.com
Consider Your Options
• Convert nothing
• Convert everything
• Convert high-priority content
11
Confidential & Proprietarywww.dclab.com
• Identify what you have
• Determine your target audience
• Decide how you’ll distribute the digital content (Web and/or
mobile, subscription model or single purchase, discrete pieces
of content, e.g., images)
• Develop your business case (costs, markets, revenue)
• Start converting!
12
Where Do I Begin?
Confidential & Proprietarywww.dclab.com 13
Customer Case Studies
Confidential & Proprietarywww.dclab.com 14
Converting a Large Content Repository
Customer Problem
• OSA needed to build a flexible digital repository of its authoritative library of scientific
journals going back to 1917; 750,000 pages spanning almost 100 years
• The materials incorporated extensive math, tables, and images, in multiple formats
which needed to be built into a cohesive database that would facilitate new approaches
to dissemination and creation of future products, not yet conceived
Solution
• Flexibility in execution–the size and breadth of the collection made it impractical to
develop full specifications in advance
• Develop an overall specification, with allowance for change as new scenarios are
discovered
• Software development sprints to incorporate changes and frequent review meetings
allowed the assessment of nuances in new materials as they came up. Close
collaboration to manage new situations
Results
• A three year project delivered on schedule and on budget, with new products already
developed and out on the market
• The close collaboration and involvement of the client shaved 6-8 months off the project
schedule, and created a product that meets all goals
Case Study – Optical Society of America
Confidential & Proprietarywww.dclab.com 15
Customer Problem
• Need to improve the content coverage and link density of their Scopus bibliographic database,
beginning with their back-list of published articles prior to 1996
 Inventory over 5.5 million Elsevier files against over 3 million Scopus records
 Convert over 50 million references to a standard XML format. Source content consists of
multiple variations of a source DTD with differing levels of quality including totally
unstructured references
 Link as many references as possible to the Scopus repository
Solution
• Automated solution to inventory the large archive and provide comprehensive inventory reporting.
• Developed a fully-automated multi-step solution, running 24 x 7 , to process the source content
and return high-quality, converted, validated and enriched references, improving the match rate to
Scopus
Results
 Decomposed over 1 million unstructured references based on pattern detection software
 Heuristically repaired and converted the source content to Elsevier CARS XML
 Validated each reference against Scopus, CrossRef, or PubMed and enriched the content
based on the results, as appropriate
 Packaged and delivered the final XML for ingestion into the Elsevier Scopus System
Case Study – Elsevier
Automating Large-Scale Reference Conversion
Confidential & Proprietarywww.dclab.com 16
Benefits and Drivers for
Content Conversion
Confidential & Proprietarywww.dclab.com 17
The Value of Structured Content
Increase Revenues
 Improve customer service
 Decrease time to market
 Expand into new markets
 Create data versatility
 Enhance discoverability
Decrease Expenses
 Increase authoring productivity
 Reduce publishing costs
 Increase information reuse
 Reduce translation costs
 Future-proof data
Successful business strategies are driven by content!
Confidential & Proprietarywww.dclab.com 18
Can your content keep up with changing technology?
 Data drives every aspect of a business from engineering and development
to maintenance, repair and operations, sales, customer service, marketing,
and more
 Documents are often converted in order to comply with law, industry
standards, or to support distribution partners and meet consumers'
expectations
 Data conversion is most desirable for its potential to lower costs by making
data easier to manage, update, reproduce, and syndicate
 Structured formatting enables content to be delivered any where at any
time on any device imaginable
Confidential & Proprietarywww.dclab.com 19
Re-purposing
Searching
Component Reuse
Enforce Data Standards
Interchange with Vendors, Customers, & World
 Creating new versions of data suitable for derivative uses
(e.g. the web, diagnostic equipment, hand-held devices,
voice devices)
 Ability to find information through text searches and
through more advanced searches that depend on context
and “understanding”
 Ability to reuse portions of data for different products and
different documentation sets
 Ability to assure that the information produced is
produced consistently and meets corporate standards
 Ability for others to use your information for
communications with others and to incorporate into
products belonging to other organizations
Various Uses for Structured Content
Confidential & Proprietarywww.dclab.com
• DON’T let your valuable content lie dormant
• Convert it into a structured format that supports the needs of
your business
• Mine that gold!
20
Key Takeaways
Confidential & Proprietarywww.dclab.com 21
Q&A
Greg Fagan
Sales Director,
Data Conversion Laboratory
(908) 723-1884
gfagan@dclab.com
@dclaboratory

More Related Content

PPT
Making the Case for Hadoop in a Large Enterprise-British Airways
PPTX
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
PDF
Data Lakes - The Key to a Scalable Data Architecture
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PDF
Destroying Data Silos
PPTX
Making Bank Predictive and Real-Time
PPTX
Breaking down data silos with OData
PPTX
Better Together: The New Data Management Orchestra
Making the Case for Hadoop in a Large Enterprise-British Airways
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
Data Lakes - The Key to a Scalable Data Architecture
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Destroying Data Silos
Making Bank Predictive and Real-Time
Breaking down data silos with OData
Better Together: The New Data Management Orchestra

What's hot (20)

PDF
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
PPTX
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
PPTX
Use dependency injection to get Hadoop *out* of your application code
PDF
The Future of Data Management: The Enterprise Data Hub
PPTX
Why Data Lake should be the foundation of Enterprise Data Architecture
PDF
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
PDF
Archive First: An Intelligent Data Archival Strategy, Part 1 of 3
PDF
Modern Data Management for Federal Modernization
PPTX
Enterprise 360 - Graphs at the Center of a Data Fabric
PDF
Houd controle over uw data
PDF
How to create a successful data archiving strategy for your Salesforce Org.
PPTX
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
PDF
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
PDF
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
PDF
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
PDF
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
PPT
Webinar: How Leading Healthcare Companies use MongoDB
PPTX
Big Data Maturity Scorecard
PPTX
Applying Big Data Superpowers to Healthcare
PPTX
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Use dependency injection to get Hadoop *out* of your application code
The Future of Data Management: The Enterprise Data Hub
Why Data Lake should be the foundation of Enterprise Data Architecture
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Archive First: An Intelligent Data Archival Strategy, Part 1 of 3
Modern Data Management for Federal Modernization
Enterprise 360 - Graphs at the Center of a Data Fabric
Houd controle over uw data
How to create a successful data archiving strategy for your Salesforce Org.
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Webinar: How Leading Healthcare Companies use MongoDB
Big Data Maturity Scorecard
Applying Big Data Superpowers to Healthcare
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Ad

Viewers also liked (20)

PPTX
What are the Strengths and Weaknesses of DITA Adoption?
PPTX
Optimizing the DITA Authoring Experience
PPTX
Converting and Integrating Legacy Data and Documents When Implementing a New CMS
PPTX
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
PPTX
New Directions 2015 – Changes in Content Best Practices
PPTX
Content Engineering and The Internet of “Smart” Things
PPT
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
PPTX
DITA for Small Teams: An Open Source Approach to DITA Content Management
PPT
When Conversion Makes Sense
PPTX
Data-Driven User Experience
PPTX
Anticipating Lightweight DITA
PPTX
Metadata Matters
PPTX
Converting and Transforming Technical Graphics
PPTX
10 Mistakes When Moving to Topic-Based Authoring
PPTX
Using HTML5 to Deliver and Monetize Your Mobile Content
PPTX
Content Conversion Done Right Saves More Than Money
PPTX
DITA's New Thang: Going Mapless!
PPTX
Precision Content™ Tools, Techniques, and Technology
PPTX
DITA, EPUB, and HTML5: An Update for 2015
PPTX
Demystifying SPL for Medical Devices
What are the Strengths and Weaknesses of DITA Adoption?
Optimizing the DITA Authoring Experience
Converting and Integrating Legacy Data and Documents When Implementing a New CMS
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
New Directions 2015 – Changes in Content Best Practices
Content Engineering and The Internet of “Smart” Things
Is Your Enterprise “fire-fighting” translation issues? Optimize the process w...
DITA for Small Teams: An Open Source Approach to DITA Content Management
When Conversion Makes Sense
Data-Driven User Experience
Anticipating Lightweight DITA
Metadata Matters
Converting and Transforming Technical Graphics
10 Mistakes When Moving to Topic-Based Authoring
Using HTML5 to Deliver and Monetize Your Mobile Content
Content Conversion Done Right Saves More Than Money
DITA's New Thang: Going Mapless!
Precision Content™ Tools, Techniques, and Technology
DITA, EPUB, and HTML5: An Update for 2015
Demystifying SPL for Medical Devices
Ad

Similar to There's Gold in Them Thar Data (20)

PPTX
Converting and Integrating Content When Implementing a New CMS
PPTX
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
PPTX
Wheeles Webinar Slides - 9-16
PPTX
Content Development: Measuring the Trends
PPTX
Creating a Hybrid Approach to Legacy Conversion
PPTX
There's Gold in Them There Archives!: Printing Industries of America
PPTX
Why Customer-Driven Content is the Gold Standard for Documentation
PPTX
Minimalism Revisited — Let’s Stop Developing Content that No One Wants
PPT
What is-ecm-1227461596391360-9
PPT
What Is Ecm?
PPTX
Preparing Your Legacy Data for Automation in S1000D
PPTX
Full-on DITA Strategies Beyond Technical Publications with Rob Hanna, ECMs
PPTX
Dealing With The Input Providers
PDF
Gilbane 2009 -- How Can Content Management Software Keep Pace?
PPTX
AMCTO presentation on moving from records managment to information management
PPT
Preparing Your Data for ECM
PPTX
Record Management
PPTX
eBooks Platforms, Standards and Use
PPT
Monica Crocker Implementing Ecm Aiim 2009
PPTX
How to Get Started with a Cross Functional Approach to Content Management - T...
Converting and Integrating Content When Implementing a New CMS
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
Wheeles Webinar Slides - 9-16
Content Development: Measuring the Trends
Creating a Hybrid Approach to Legacy Conversion
There's Gold in Them There Archives!: Printing Industries of America
Why Customer-Driven Content is the Gold Standard for Documentation
Minimalism Revisited — Let’s Stop Developing Content that No One Wants
What is-ecm-1227461596391360-9
What Is Ecm?
Preparing Your Legacy Data for Automation in S1000D
Full-on DITA Strategies Beyond Technical Publications with Rob Hanna, ECMs
Dealing With The Input Providers
Gilbane 2009 -- How Can Content Management Software Keep Pace?
AMCTO presentation on moving from records managment to information management
Preparing Your Data for ECM
Record Management
eBooks Platforms, Standards and Use
Monica Crocker Implementing Ecm Aiim 2009
How to Get Started with a Cross Functional Approach to Content Management - T...

More from dclsocialmedia (7)

PPTX
Introduction to Structured Authoring
PPTX
Automating Complex High-Volume Technical Paper and Journal Article Page Compo...
PPTX
Converting Your Legacy Data to S1000D
PPTX
Marketing and Strategy and Bears... oh my!
PPTX
Finding Role Clarity in UX Chaos
PPTX
Managing Documentation Projects in Nearly Any Environment
PPTX
Coming Up to Speed with XML Authoring in Adobe FrameMaker
Introduction to Structured Authoring
Automating Complex High-Volume Technical Paper and Journal Article Page Compo...
Converting Your Legacy Data to S1000D
Marketing and Strategy and Bears... oh my!
Finding Role Clarity in UX Chaos
Managing Documentation Projects in Nearly Any Environment
Coming Up to Speed with XML Authoring in Adobe FrameMaker

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
KodekX | Application Modernization Development
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I

There's Gold in Them Thar Data

  • 1. Confidential & Proprietarywww.dclab.comwww.dclab.com There’s Gold in Them Thar Data! Leveraging the Value of Your Legacy Content Greg Fagan, Sales Director, Data Conversion Laboratory (DCL)
  • 2. Confidential & Proprietarywww.dclab.com 2 Valuable Content Transformed • Document Digitization • XML and HTML Conversion • eBook Production • Hosted Solutions • Big Data Automation • Conversion Management • Editorial Services • Harmonizer
  • 3. Confidential & Proprietarywww.dclab.com 3 Experience the DCL Difference DCL blends years of conversion experience with cutting-edge technology and the infrastructure to make the process easy and efficient. • World-Class Services • Leading-Edge Technology • Unparalleled Infrastructure • US-Based Management • Complex-Content Expertise • 24/7 Online Project Tracking • Automated Quality Control • Global Capabilities
  • 4. Confidential & Proprietarywww.dclab.com We Serve a Very Broad Client Base . . . 4
  • 5. Confidential & Proprietarywww.dclab.com 5 . . . Spanning All Industries • Aerospace • Associations • Defense • Distribution • Education • Financial • Government • Libraries • Life Sciences • Manufacturing • Medical • Museums • Periodicals • Professional • Publishing • Reference • Research • Societies • Software • STM • Technology • Telecommunications • Universities • Utilities
  • 6. Confidential & Proprietarywww.dclab.com • Companies focused on delivering new content • Are they overlooking what they already have and missing opportunities? • The digital age has increased potential audience size and demand for content • Don’t ignore what you already have – it may be highly valuable 6 What’s in Your Archives?
  • 7. Confidential & Proprietarywww.dclab.com • Paper • Microfilm • Photographs and/or slides • Electronic files • Some combination of the above • Can you find it easily? 7 Which Format Is Your Data in?
  • 8. Confidential & Proprietarywww.dclab.com Tackling My Archival Content Is Scary! 8
  • 9. Confidential & Proprietarywww.dclab.com • Task seems too big • Who takes ownership? • Cost in dollars and staff resources • Will take forever to complete 9 What’s Causing Your Fear and Anxiety?
  • 10. Confidential & Proprietarywww.dclab.com • Perception: Converting legacy data will be costly, labor- intensive, and too complex to manage. The ROI is questionable, thus it’s not worth the risk • Reality: Proper analysis and planning, a customized process, and the expertise of a trusted partner limits the risk and ensures maximum ROI 10 Perception vs. Reality
  • 11. Confidential & Proprietarywww.dclab.com Consider Your Options • Convert nothing • Convert everything • Convert high-priority content 11
  • 12. Confidential & Proprietarywww.dclab.com • Identify what you have • Determine your target audience • Decide how you’ll distribute the digital content (Web and/or mobile, subscription model or single purchase, discrete pieces of content, e.g., images) • Develop your business case (costs, markets, revenue) • Start converting! 12 Where Do I Begin?
  • 13. Confidential & Proprietarywww.dclab.com 13 Customer Case Studies
  • 14. Confidential & Proprietarywww.dclab.com 14 Converting a Large Content Repository Customer Problem • OSA needed to build a flexible digital repository of its authoritative library of scientific journals going back to 1917; 750,000 pages spanning almost 100 years • The materials incorporated extensive math, tables, and images, in multiple formats which needed to be built into a cohesive database that would facilitate new approaches to dissemination and creation of future products, not yet conceived Solution • Flexibility in execution–the size and breadth of the collection made it impractical to develop full specifications in advance • Develop an overall specification, with allowance for change as new scenarios are discovered • Software development sprints to incorporate changes and frequent review meetings allowed the assessment of nuances in new materials as they came up. Close collaboration to manage new situations Results • A three year project delivered on schedule and on budget, with new products already developed and out on the market • The close collaboration and involvement of the client shaved 6-8 months off the project schedule, and created a product that meets all goals Case Study – Optical Society of America
  • 15. Confidential & Proprietarywww.dclab.com 15 Customer Problem • Need to improve the content coverage and link density of their Scopus bibliographic database, beginning with their back-list of published articles prior to 1996  Inventory over 5.5 million Elsevier files against over 3 million Scopus records  Convert over 50 million references to a standard XML format. Source content consists of multiple variations of a source DTD with differing levels of quality including totally unstructured references  Link as many references as possible to the Scopus repository Solution • Automated solution to inventory the large archive and provide comprehensive inventory reporting. • Developed a fully-automated multi-step solution, running 24 x 7 , to process the source content and return high-quality, converted, validated and enriched references, improving the match rate to Scopus Results  Decomposed over 1 million unstructured references based on pattern detection software  Heuristically repaired and converted the source content to Elsevier CARS XML  Validated each reference against Scopus, CrossRef, or PubMed and enriched the content based on the results, as appropriate  Packaged and delivered the final XML for ingestion into the Elsevier Scopus System Case Study – Elsevier Automating Large-Scale Reference Conversion
  • 16. Confidential & Proprietarywww.dclab.com 16 Benefits and Drivers for Content Conversion
  • 17. Confidential & Proprietarywww.dclab.com 17 The Value of Structured Content Increase Revenues  Improve customer service  Decrease time to market  Expand into new markets  Create data versatility  Enhance discoverability Decrease Expenses  Increase authoring productivity  Reduce publishing costs  Increase information reuse  Reduce translation costs  Future-proof data Successful business strategies are driven by content!
  • 18. Confidential & Proprietarywww.dclab.com 18 Can your content keep up with changing technology?  Data drives every aspect of a business from engineering and development to maintenance, repair and operations, sales, customer service, marketing, and more  Documents are often converted in order to comply with law, industry standards, or to support distribution partners and meet consumers' expectations  Data conversion is most desirable for its potential to lower costs by making data easier to manage, update, reproduce, and syndicate  Structured formatting enables content to be delivered any where at any time on any device imaginable
  • 19. Confidential & Proprietarywww.dclab.com 19 Re-purposing Searching Component Reuse Enforce Data Standards Interchange with Vendors, Customers, & World  Creating new versions of data suitable for derivative uses (e.g. the web, diagnostic equipment, hand-held devices, voice devices)  Ability to find information through text searches and through more advanced searches that depend on context and “understanding”  Ability to reuse portions of data for different products and different documentation sets  Ability to assure that the information produced is produced consistently and meets corporate standards  Ability for others to use your information for communications with others and to incorporate into products belonging to other organizations Various Uses for Structured Content
  • 20. Confidential & Proprietarywww.dclab.com • DON’T let your valuable content lie dormant • Convert it into a structured format that supports the needs of your business • Mine that gold! 20 Key Takeaways
  • 21. Confidential & Proprietarywww.dclab.com 21 Q&A Greg Fagan Sales Director, Data Conversion Laboratory (908) 723-1884 gfagan@dclab.com @dclaboratory

Editor's Notes

  • #2: Good afternoon, everyone! Thanks for joining us for this webinar. Today we’re going to talk about finding the value in your legacy content to create new products and new revenue streams. I’m Greg Fagan, and I’m the Sales Director for the publishing and financial industries at DCL. Because you’re all busy people, I’ve tried to keep this presentation as concise as possible. I’ll talk for about 15-20 minutes and then open the floor to your questions.
  • #3: Just some quick background information on DCL. We’re content conversion experts. We take content in any format you might have it and convert it to reusable formats for digital output such as XML, SGML, HTML5, DITA, and EPUB. We not only convert your content, but we can enrich it to make it more discoverable, usable, and deliverable to any output format or device. Aside from conversion, we offer a suite of services, including hosting, editorial services, and project management.
  • #4: Our deep experience, sophisticated infrastructure, and ferocious commitment to quality are what set us apart from the pack.
  • #5: We serve a broad range of clients. Myriad large, global companies from many different sectors entrust their content to us.
  • #6: And our clients span a wide array of industries, which speaks to our familiarity and fluency with many different XML schemas. Publishers, societies, pharmaceutical companies, defense contractors, and government agencies are just a few of the types of clients and industries we serve.
  • #7: Most businesses and organizations, be they in publishing, financial services, pharmaceuticals, aerospace, and most other sectors, are focused on how they’re going to produce and deliver new content or data, and they should be. But many of these same organizations also have decades worth of legacy content, and in many cases, it’s sitting there untapped when it could, if properly digitized and structured, be creating value and enhancing their business. Digitization and the ever-expanding list of digital delivery channels have vastly increased potential audience size and demand for content. Legacy content is often converted in order to comply with legal or industry standards or to support distribution partners and meet consumers' expectations. Generally however, legacy conversion is most desirable for its potential to lower costs by making data easier to manage, update, reproduce and syndicate. Given that, it’s simply bad business to ignore your archives.
  • #8: Legacy content exists in many forms. There’s paper, like hard copy books, journals, and newspapers; microfilm, photographs and/or slides, electronic files (PDF, Word); or various combinations of the above. In all likelihood, you can’t find it very easily, even if it’s in digital form. It’s sitting in boxes in storerooms or basements, or on shelves, or in 50 different subdirectories on your network. Think about your own legacy content. Which formats do you have? How is it stored? Is it retrievable? We’ve seen all kinds of legacy material, including mountaineering maps and images, letters and papers from famous people that have been contributed to a university library, specialized image collections, diaries of Civil War officers, scientific journals dating back decades and even centuries, vintage car repair manuals, movie magazines from the golden age of cinema...the list is endless.
  • #9: The thought of actually organizing and reviewing all this data is daunting and downright scary. But the truth is, it doesn’t have to be.
  • #10: So what’s behind this fear and anxiety? Well, if your organization has decades worth of legacy content in various formats and saved in many different places, the task of compiling, organizing, and converting it seems Herculean. And who will take ownership and drive this huge project from start to finish? Finally, even from a high overview, it seems the cost in dollars and staff/management will be prohibitive, and it will take ages to complete. These are legitimate and understandable concerns. How do we review and analyze all that content? Do we have the right people on staff to drive it? Can we properly estimate costs and secure the budget to proceed? These are all good questions, but they need not make you flee in fear.
  • #11: Often perception and reality are very different things. In many cases, the perception represents a distorted view. In most cases, careful planning, a customized process, and the help of a knowledgeable and trusted partner minimizes the risk and ensures maximum return on investment. Converting legacy data is an investment that results in increased revenues and decreased expenses. Not only will having data maintained in a more structured, easily configurable format increase customer service and decrease time to market, it will allow for expansion into new markets and create data versatility as well. Additionally, publishing and translation costs will be reduced and authoring productivity and information reuse will be increased. If you want to realize these benefits, it’s critical in my view to work with a vendor that has the content expertise and technological sophistication to help you manage your conversion successfully.
  • #12: You have three options when considering legacy conversion and calculating expected ROI: 1) Convert nothing: This will result in delayed or no ROI. 2) Convert everything: This will result in higher conversion costs and a potentially lower ROI. 3) Convert top-priority content: This is the best option to start with, as there will be some conversion costs but a maximized ROI. It will take some effort on your part to identify your high-priority content, but it’s a worthwhile exercise that will pay real dividends. You can always convert the remaining content later if it stands up to the same cost-benefit analysis as your top-priority content. Converting nothing is only a sensible option if, in your judgment, your legacy content has no potential value in digital formats. And is there any organization that can say that?
  • #13: So how you start? The first step is to identify what you have in terms of volume, formats, completeness, and overall condition (e.g., old paper, incomplete files, etc.). Doing this helps determine value and cost. Then you need to think about your target audience and what they’ll need, which might require user surveys and focus groups. Keep in mind that once the content is discoverable, the audience will likely be larger than you might think. Next you’ll need to decide how the content will be distributed. Will it be offered across all platforms and devices? Subscription model or single purchase? Sold to libraries/consortia/corporations/ or to individuals. Then develop your business case. Think about all the available alternatives in creating your digital content and estimate costs. Determine your potential markets and projected revenue. Some of this will be guesswork, but it’s important to set measurable goals. Finally, get off the starting line and run daylight! (That’s a metaphor; there’s no actual running involved.)
  • #15: Here’s an example of a large legacy conversion that DCL performed for the Optical Society of America. OSA needed to build a flexible digital repository of its journal content going back to 1917, which comprised 750,000 pages. The content included extensive math, tables, and images in various formats, so we purposely kept the specs fluid to accommodate new content types that arose. This illustrates the point that every legacy content collection is unique and thus requires a customized solution. The new XML repository has already yielded a revenue-generating spinoff image bank. And that highlights one of the real benefits of having a structured content repository – the ability to create new products and revenue streams.
  • #16: Elsevier wanted to enrich the references in its Scopus bibliographic database, which is their homegrown version of PubMed and CrossRef, beginning with their backlist of articles published prior to 1996. We’re talking about 5 and a half million articles that needed to be inventoried and over 50 million references that needed to be converted to XML. Many of the references were completely unstructured; that doesn’t work well with XML, which is all about structure. We devised an automated inventory and reporting solution, along with an automated process that decomposed the unstructured references into more granular elements and then recomposed them into valid CARS XML. Once the repair process was complete, the references could be validated and linked to Scopus, CrossRef, or PubMed. This increased the value of the database not only to researchers, but also to current and potential institutional subscribers.
  • #18: Well-structured content has many benefits, with the most important being that it can increase revenue by decreasing time to market and enabling new product development. It also decreases expenses, such as publishing and translation costs, over time, which makes it a smart investment. Often legacy content is more complex and difficult to manage than new content. In many cases, it was designed for one specific output and not much thought was given to proper storage, retrieval, or reusability. There are also different document types, formats, and levels of complexity, like heavy math and tabular material that was never meant for digital output. This is where the help of a trusted partner can be invaluable in helping you identify, categorize, and convert your content to a well-structured format. Your content should drive your business strategy.
  • #19: But you can’t structure your content and think your work is done. It’s an ongoing process to keep up with industry standards, compliance, and constantly evolving outputs. Once the major work is done, however, the changes are much easier to manage, and your content is ready for delivery to any output. Content drives every aspect of your business, so make sure yours is ready to take you in the right direction.
  • #20: Structured content has many uses, with reuse and repurposing the most important in my mind. Why? Because they generate revenue. The others are important, too. Different industries have differing degrees of importance, but money talks in all of them. When your content is structured at a granular level, you can assemble the different components into new products, as the OSA did with the creation of the image bank that I referred to. That wouldn’t have happened if they hadn’t taken the step to convert their legacy content. That’s just one example; there are many, many possibilities once your content has been converted to a structured format.
  • #21: [Read bullets.] Once you get started, it’s easier than you might think!
  • #22: I’d like to thank you for tuning in today. Feel free to contact me directly anytime. Now I’m happy to take your questions.