SlideShare a Scribd company logo
Adding Data Sources
to the Reporter
Evergreen International
Conference 2015
Adding data sources to the reporter
Adding data sources to the reporter
Adding data sources to the reporter
Adding data sources to the reporter
Amount of Stuff
• As of 4/14/15 – 696 tables in my production
• 5,505 columns
• Over 30 million theoretical relationships
• Theory, not reality
• Still, there are a lot there
Adding data sources to the reporter
Adding data sources to the reporter
Adding data sources to the reporter
Adding data sources to the reporter
Do you want to do it?
Two Types of Change
Addition vs. Alteration
Overhead
Additions. Keeping track of
your changes and adding
them back in after releases.
Alterations. Keeping track of your
changes, Evergreen’s, resolving
conflicts and adding them back in.
Adding data sources to the reporter
“Was the population of British citizens in
India, in the 1800s, a significant part of
the empire’s population?”
How to Answer a Reference Question in
SQL (and Stretch a Metaphor)
1. Phase 1 – Ask the question.
2. Phase 2 – Allocate resources.
3. Phase 3 – Model the data (and query).
4. Phase 4 – Implement new data sources.
5. Phase 5 – Profit (or at least rejoice).
An Example
How to Answer a Reference Question in
SQL (and Stretch a Metaphor)
1. Phase 1 – Ask the question.
2. Phase 2 – Allocate resources.
3. Phase 3 – Model the data (and query).
4. Phase 4 – Implement new data sources.
5. Phase 5 – Profit (or at least rejoice).
The Question
“How many Hugo and Nebula award winning
novels do I have circulating?”
Is that the right question?
Broadening the scope of the question helps.
You don’t want to end up down a never ending
road of new questions but you do want to create
a way to answer the question this time that can
easily be reused. Too narrow a scope can create
a solution that only answers one question.
How to Answer a Reference Question in
SQL (and Stretch a Metaphor)
1. Phase 1 – Ask the question.
2. Phase 2 – Allocate resources.
3. Phase 3 – Model the data (and query).
4. Phase 4 – Implement new data sources.
5. Phase 5 – Profit (or at least rejoice).
Allocating
Gather External
Data, Reference
Assistant
Rough Prototype of Data
Structure for SQL Table,
AKA Google Sheet, Me
Someone in IT or
a data savy
librarian to design
tables and upload
data
Someone to
write queries
Short vs. Long Term
Internal vs. External Data
Internal data is usually a new view or otherwise
using data internal to Evergreen only. You can
usually make this a self perpetuating source.
External data will have to be maintained. Do you
have the resources and commitment to do this?
What I did.
• I started a Google Sheet with the columns I
thought I needed: award, category, author,
title, year and notes.
• I pointed a reference assistant at it and said fill
it in for novels for the last thirty years.
• Then I reviewed it.
How to Answer a Reference Question in
SQL (and Stretch a Metaphor)
1. Phase 1 – Ask the question.
2. Phase 2 – Allocate resources.
3. Phase 3 – Model the data (and query).
4. Phase 4 – Implement new data sources.
5. Phase 5 – Profit (or at least rejoice).
Everyone must work together at this stage
to understand what can be stored and what
can be queried.
1 Table or 3
I could have created a single table but I instead
decided to go with three.
Awards Table
Author
Title
Reporter SSR
Author
Title
Record
Call Numbers
Record
Copies
Call Number
Circulating Library
Org Unit
Testing the Query
• Now I’m going to prototype what I want to do
later in the reporter in SQL.
• Faster for prototyping than reporter and let me
know if there’s anything here critical to change
before I have more to change.
• Or if you decide to scrap it, it’s that much
sooner you start catching up on House
of Cards.
A Quick SQL Query
select era.year, era.author, era.title, era.notes, cat.category,
agency.agency,
array_agg(aou.shortname) as reporterstuff
from extend_reporter.awards era
join extend_reporter.awards_agency agency on agency.id = era.agency
left join extend_reporter.awards_category cat on cat.id = era.category
join reporter.super_simple_record ssr on ssr.author = era.author
join asset.call_number acn on acn.id = ssr.id
join asset.copy ac on ac.call_number = acn.id
join actor.org_unit aou on aou.id = ac.circ_lib
where agency.agency ilike '%hugo%' and ssr.title = era.title
group by 1, 2, 3, 4, 5, 6;
Results
How to Answer a Reference Question in
SQL (and Stretch a Metaphor)
1. Phase 1 – Ask the question.
2. Phase 2 – Allocate resources.
3. Phase 3 – Model the data (and query).
4. Phase 4 – Implement new data sources.
5. Phase 5 – Profit (or at least rejoice).
Remember when I said this wouldn’t
be about the technology so much?
create table extend_reporter.awards_agency (
id serial PRIMARY KEY,
agency text
);
create table extend_reporter.awards_category (
id serial PRIMARY KEY,
category text
);
create table extend_reporter.awards (
id serial,
agency INT NOT NULL REFERENCES extend_reporter.awards_agency (id),
category INT REFERENCES extend_reporter.awards_category (id),
year integer NOT NULL,
author text NOT NULL,
title text NOT NULL,
notes text
);
1) Create tables.
2) Upload data.
3) Create field mapper entries (fm_IDL.xml).
4) Update IDLs on server (both).
5) Run autogen
6) Restart services, Apache and clark-kent.
Fm_IDL.xml
• Aka the Field Mapper
• /openils/var/web/reports/fm_IDL.xml
• /openils/conf/fm_IDL.xml
What does it do?
1. Controls how source information
appears in the reporter.
2. Controls how sources connect in
the reporter.
<class id="eraw" controller="open-ils.cstore open-ils.pcrud"
oils_obj:fieldmapper="extend_reporter::awards"
oils_persist:tablename="extend_reporter.awards”
reporter:core="false" reporter:label="Award Winners">
<fields oils_persist:primary="id">
<field reporter:label="All Award Agencies" name="agency"
oils_persist:virtual="true" reporter:datatype="link"/>
<field reporter:label="All Award Categories" name="category”
oils_persist:virtual="true" reporter:datatype="link"/>
<field reporter:label="Year of Award" name="year" reporter:datatype="int"/>
<field reporter:label="Author" name="author" reporter:datatype="text"/>
<field reporter:label="Title" name="title" reporter:datatype="text"/>
<field reporter:label="Notes" name="notes" reporter:datatype="text"/>
</fields>
<links>
<link field="agency" reltype="has_a" key="id" map="" class="erawagency"/>
<link field="category" reltype="might_have" key="id" map="" class="erawcat"/>
</links>
</class>
Fm_IDL.xml part
Two Tips
• As your group is working be careful what
collaboration you use. Formatting can be evil,
SQL can suffer from single quotes being
changed and the fieldmapper XML doesn’t like
capitalizations.
• If you have trouble with your field mapper
entries look for similar examples in the stock
one.
How to Answer a Reference Question in
SQL (and Stretch a Metaphor)
1. Phase 1 – Ask the question.
2. Phase 2 – Allocate resources.
3. Phase 3 – Model the data (and query).
4. Phase 4 – Implement new data sources.
5. Phase 5 – Profit (or at least rejoice).
Adding data sources to the reporter
The Aftermath
• Document and write manual
• Back up changes (git?)
• Write reports
Now Catch Up On House of Cards
Purchase Alert
Calculating thresholds,
combining different
kinds of holds
Make easier for staff
Weeding Reports
Unions of never
checked outs and non
recent checkouts
Takes a long time to
run
Make easier for staff
Two Other Examples
Adding data sources to the reporter

More Related Content

PPTX
SQL-queries-for-Data-Analysts-Updated.pptx
DOC
Mandy_Stertzbach_Resume wSAP
DOC
Resume for andy mitchell current
PDF
Etude 375 startupers (NUMA)
PPTX
30 desain brosur real estate inspiratif
PDF
DoubleClick Certification Programs
DOCX
COMPUTERS Database
PPTX
Data modeling tips from the trenches
SQL-queries-for-Data-Analysts-Updated.pptx
Mandy_Stertzbach_Resume wSAP
Resume for andy mitchell current
Etude 375 startupers (NUMA)
30 desain brosur real estate inspiratif
DoubleClick Certification Programs
COMPUTERS Database
Data modeling tips from the trenches

Similar to Adding data sources to the reporter (20)

PPTX
Unit I Database concepts - RDBMS & ORACLE
PPT
2005 fall cs523_lecture_4
PPTX
Graphics designing.pptx
PPTX
DATABASE THOERY and practice o data.pptx
PDF
Database Technologies
PPTX
data base u2 dfhjhdbgjhbfxjjkgfbjkg.pptx
PPT
D B M S Animate
PDF
Dbms narrative question answers
DOCX
Dbms Concepts
ODP
Data massage! databases scaled from one to one million nodes (ulf wendel)
PDF
Database_Introduction.pdf
PPT
Nunes database
PPT
603s129
PDF
Hands-On Database 2nd Edition Steve Conger Solutions Manual
PPTX
Distributed DBMS - Unit 2 - Overview of RDBMS
PPTX
Introduction to Database Management Systems
PPTX
Manjeet Singh.pptx
PPTX
Relational Database Design
PDF
Hands-On Database 2nd Edition Steve Conger Solutions Manual
PDF
Introduction to database-Normalisation
Unit I Database concepts - RDBMS & ORACLE
2005 fall cs523_lecture_4
Graphics designing.pptx
DATABASE THOERY and practice o data.pptx
Database Technologies
data base u2 dfhjhdbgjhbfxjjkgfbjkg.pptx
D B M S Animate
Dbms narrative question answers
Dbms Concepts
Data massage! databases scaled from one to one million nodes (ulf wendel)
Database_Introduction.pdf
Nunes database
603s129
Hands-On Database 2nd Edition Steve Conger Solutions Manual
Distributed DBMS - Unit 2 - Overview of RDBMS
Introduction to Database Management Systems
Manjeet Singh.pptx
Relational Database Design
Hands-On Database 2nd Edition Steve Conger Solutions Manual
Introduction to database-Normalisation
Ad

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPT
Teaching material agriculture food technology
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Assigned Numbers - 2025 - Bluetooth® Document
Teaching material agriculture food technology
A comparative analysis of optical character recognition models for extracting...
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
Encapsulation_ Review paper, used for researhc scholars
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
Ad

Adding data sources to the reporter

  • 1. Adding Data Sources to the Reporter Evergreen International Conference 2015
  • 6. Amount of Stuff • As of 4/14/15 – 696 tables in my production • 5,505 columns • Over 30 million theoretical relationships • Theory, not reality • Still, there are a lot there
  • 11. Do you want to do it?
  • 12. Two Types of Change Addition vs. Alteration
  • 13. Overhead Additions. Keeping track of your changes and adding them back in after releases. Alterations. Keeping track of your changes, Evergreen’s, resolving conflicts and adding them back in.
  • 15. “Was the population of British citizens in India, in the 1800s, a significant part of the empire’s population?”
  • 16. How to Answer a Reference Question in SQL (and Stretch a Metaphor) 1. Phase 1 – Ask the question. 2. Phase 2 – Allocate resources. 3. Phase 3 – Model the data (and query). 4. Phase 4 – Implement new data sources. 5. Phase 5 – Profit (or at least rejoice).
  • 18. How to Answer a Reference Question in SQL (and Stretch a Metaphor) 1. Phase 1 – Ask the question. 2. Phase 2 – Allocate resources. 3. Phase 3 – Model the data (and query). 4. Phase 4 – Implement new data sources. 5. Phase 5 – Profit (or at least rejoice).
  • 19. The Question “How many Hugo and Nebula award winning novels do I have circulating?”
  • 20. Is that the right question?
  • 21. Broadening the scope of the question helps. You don’t want to end up down a never ending road of new questions but you do want to create a way to answer the question this time that can easily be reused. Too narrow a scope can create a solution that only answers one question.
  • 22. How to Answer a Reference Question in SQL (and Stretch a Metaphor) 1. Phase 1 – Ask the question. 2. Phase 2 – Allocate resources. 3. Phase 3 – Model the data (and query). 4. Phase 4 – Implement new data sources. 5. Phase 5 – Profit (or at least rejoice).
  • 23. Allocating Gather External Data, Reference Assistant Rough Prototype of Data Structure for SQL Table, AKA Google Sheet, Me Someone in IT or a data savy librarian to design tables and upload data Someone to write queries
  • 24. Short vs. Long Term Internal vs. External Data Internal data is usually a new view or otherwise using data internal to Evergreen only. You can usually make this a self perpetuating source. External data will have to be maintained. Do you have the resources and commitment to do this?
  • 25. What I did. • I started a Google Sheet with the columns I thought I needed: award, category, author, title, year and notes. • I pointed a reference assistant at it and said fill it in for novels for the last thirty years. • Then I reviewed it.
  • 26. How to Answer a Reference Question in SQL (and Stretch a Metaphor) 1. Phase 1 – Ask the question. 2. Phase 2 – Allocate resources. 3. Phase 3 – Model the data (and query). 4. Phase 4 – Implement new data sources. 5. Phase 5 – Profit (or at least rejoice).
  • 27. Everyone must work together at this stage to understand what can be stored and what can be queried.
  • 28. 1 Table or 3 I could have created a single table but I instead decided to go with three.
  • 29. Awards Table Author Title Reporter SSR Author Title Record Call Numbers Record Copies Call Number Circulating Library Org Unit
  • 30. Testing the Query • Now I’m going to prototype what I want to do later in the reporter in SQL. • Faster for prototyping than reporter and let me know if there’s anything here critical to change before I have more to change. • Or if you decide to scrap it, it’s that much sooner you start catching up on House of Cards.
  • 31. A Quick SQL Query select era.year, era.author, era.title, era.notes, cat.category, agency.agency, array_agg(aou.shortname) as reporterstuff from extend_reporter.awards era join extend_reporter.awards_agency agency on agency.id = era.agency left join extend_reporter.awards_category cat on cat.id = era.category join reporter.super_simple_record ssr on ssr.author = era.author join asset.call_number acn on acn.id = ssr.id join asset.copy ac on ac.call_number = acn.id join actor.org_unit aou on aou.id = ac.circ_lib where agency.agency ilike '%hugo%' and ssr.title = era.title group by 1, 2, 3, 4, 5, 6;
  • 33. How to Answer a Reference Question in SQL (and Stretch a Metaphor) 1. Phase 1 – Ask the question. 2. Phase 2 – Allocate resources. 3. Phase 3 – Model the data (and query). 4. Phase 4 – Implement new data sources. 5. Phase 5 – Profit (or at least rejoice).
  • 34. Remember when I said this wouldn’t be about the technology so much? create table extend_reporter.awards_agency ( id serial PRIMARY KEY, agency text ); create table extend_reporter.awards_category ( id serial PRIMARY KEY, category text ); create table extend_reporter.awards ( id serial, agency INT NOT NULL REFERENCES extend_reporter.awards_agency (id), category INT REFERENCES extend_reporter.awards_category (id), year integer NOT NULL, author text NOT NULL, title text NOT NULL, notes text );
  • 35. 1) Create tables. 2) Upload data. 3) Create field mapper entries (fm_IDL.xml). 4) Update IDLs on server (both). 5) Run autogen 6) Restart services, Apache and clark-kent.
  • 36. Fm_IDL.xml • Aka the Field Mapper • /openils/var/web/reports/fm_IDL.xml • /openils/conf/fm_IDL.xml
  • 37. What does it do? 1. Controls how source information appears in the reporter. 2. Controls how sources connect in the reporter.
  • 38. <class id="eraw" controller="open-ils.cstore open-ils.pcrud" oils_obj:fieldmapper="extend_reporter::awards" oils_persist:tablename="extend_reporter.awards” reporter:core="false" reporter:label="Award Winners"> <fields oils_persist:primary="id"> <field reporter:label="All Award Agencies" name="agency" oils_persist:virtual="true" reporter:datatype="link"/> <field reporter:label="All Award Categories" name="category” oils_persist:virtual="true" reporter:datatype="link"/> <field reporter:label="Year of Award" name="year" reporter:datatype="int"/> <field reporter:label="Author" name="author" reporter:datatype="text"/> <field reporter:label="Title" name="title" reporter:datatype="text"/> <field reporter:label="Notes" name="notes" reporter:datatype="text"/> </fields> <links> <link field="agency" reltype="has_a" key="id" map="" class="erawagency"/> <link field="category" reltype="might_have" key="id" map="" class="erawcat"/> </links> </class> Fm_IDL.xml part
  • 39. Two Tips • As your group is working be careful what collaboration you use. Formatting can be evil, SQL can suffer from single quotes being changed and the fieldmapper XML doesn’t like capitalizations. • If you have trouble with your field mapper entries look for similar examples in the stock one.
  • 40. How to Answer a Reference Question in SQL (and Stretch a Metaphor) 1. Phase 1 – Ask the question. 2. Phase 2 – Allocate resources. 3. Phase 3 – Model the data (and query). 4. Phase 4 – Implement new data sources. 5. Phase 5 – Profit (or at least rejoice).
  • 42. The Aftermath • Document and write manual • Back up changes (git?) • Write reports
  • 43. Now Catch Up On House of Cards
  • 44. Purchase Alert Calculating thresholds, combining different kinds of holds Make easier for staff Weeding Reports Unions of never checked outs and non recent checkouts Takes a long time to run Make easier for staff Two Other Examples

Editor's Notes

  • #4: Let’s get one thing out of the way. This isn’t about the technology. Sure some technical terms will come up as we talk but the actually technology is pretty straight forward for your database and systems people. It’s well documented in the community documentation. What I’m going to talk about today is mostly what comes before and after that documentation.
  • #5: We’re going to talk about Data.
  • #7: No reason to match the ids for your copies to the dates of edits of your patrons even if you can somehow force the system to do it
  • #8: This is the database.
  • #9: This is the database.
  • #10: This is the database.
  • #11: This is what you’re messing with and it can seem overwhelming.
  • #13: Customizing Evergreen is great but … Example of addition: you want to add a table of information about zip codes in your region. Alteration – you want to add a column to the actor.usr table to include school IDs in it’s own field. It’s a minor alteration but still changing an existing thing rather than something that exists out on it’s own.
  • #14: I’m going to be talking about additions but a lot of this information applies to alterations as well.
  • #20: So I call Brodart and have them add to my fast tips but …. I also worry about classics
  • #21: I think this might be usable a lot in the future so I want it to be more general than just for this one question, so I broaden it. I want it to be usable for lots of awards but immediately I also pick five others to include. Right now I only want novels. And there are Pultizer and Nobel prizes awarded for other things.
  • #25: In my case I do here. The labor isn’t great, the sql to add them is simple. But there can be a hidden cost. Will we have to re-evaluate? Will we want to add new categories? I’m opening these possible questions but I’m OK with that.
  • #27: This is where things are often handed over to IT staff unquestioning. It’s like a thick black velvet curtain. It shouldn’t be. Librarians should be behind that curtain pulling the levers. I’m not talking about maintaining the IT infrastructure of the data base. I’m not talking about being able to rebuild a transmission. I’m talking about driving the car, working with fingers in the data.
  • #29: At this point I’m working out how am I going to make this report work. I want things easily extensible so that’s why I broke it out. I stored Booker award but if I want to change it to Man Booker I can do it once there. How am I going to connect everything.
  • #30: This gives me the rough idea of how the query works. This is critical. This part determines how it can be represented in the field mapper and then how the reporter works. Pay a lot of attention to this stage. Don’t skimp on it. This will become a model for my sql queries to test with, creating my field mapper entries and eventually building reporter reports.
  • #31: Avoid a bunch of redoing more technical stuff which can be time consuming
  • #35: This creates the tables and there are insert statements with it. Your IT folks can do this and it’s not complicated SQL but what you’ll see are the decisions we made, title has to be able to be null, and category and the references other tables. I’m using the extend_reporter schema but really it could go anywhere.
  • #39: The docs on the field mapper are very basic and could use being a bit more verbose but there are lots of example in the default fieldmapper to guide you
  • #42: Actually realized I made a mistake, I realized that I hadn’t in the field mapper connected the author to the reporter field as a foreign key. So, it worked in SQL but not here. So, that’s an evaluation I had to go back and fix.
  • #43: I’m not going to tell you how to do this stuff but …
  • #45: Purchase Alert: view, Weeding Report: table, refreshed, Postgres 9.3 materialized view