SlideShare a Scribd company logo
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
SGF2018 2454:
Lazy Programmers Write Self-Modifying Code
Or Dealing with XML Ordinals
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Contact Information
Copyright © 2012-2018 David B. Horvath, CCP — All Rights Reserved
The Speaker can be contacted at:
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: dhorvath@cobs.com
Web: http://www.cobs.com/
All trademarks and servicemarks are the property of their respective
owners.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML - Background
• XML stands for eXtensible Markup Language
• Originally created in 1996
• Consists of Markup and Content
• Markup defines the items (fields, etc.) – represented with tags
• Content is the data
• Is transportable and human readable
• If well formed, you’ll have a definition (XSD – XML Schema Definition)
• If not well formed, you’ll only have the XML data file itself
• Easy for data provider to change layout: update XSD, add data to XML file
• An easy way to think of XML is “CSV on steroids”
• Very flexible: Advantage and Disadvantage
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XML File Sample
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><gpx
xmlns="http://www.topografix.com/GPX/1/1"
xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v2"
creator="nüvi 2370" version="1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1
http://www.topografix.com/GPX/1/1/gpx.xsd
http://www.garmin.com/xmlschemas/TrackPointExtensionv2.xsd"><metadata><l
ink href="http://www.garmin.com"><text>Garmin
International</text></link><time>2012-04-
12T04:31:39Z</time></metadata><wpt lat="40.247249" lon="-
75.513001"><ele>28.72</ele><name>002</name><sym>Waypoint</sym></wpt><wpt
lat="39.764033" lon="-75.551346"><ele>61.17</ele>
• Not very helpful viewed that way
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XML File Sample
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XML File Sample
• Treating the file as text may be helpful
• Elements like <trk> can have repeating sub-elements like <trkseg>
• But we don’t know the “data model”
• I’m using examples from my Garmin GPS
• I don’t have to sanitize the data like I would with a file from work…
• I’m not going to teach you XML coding today
• http://en.wikipedia.org/wiki/Xml provides good background
gpx_small_xml.txt
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
XML
XSD File Sample
• Treating the file as text may be helpful
• Describes exactly what is expected in the XML data
file
Gpx_xsd.txt
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
SAS XML Mapping Tool
• Free download from http://support.sas.com/kb/33/584.html
• Creates a Map for SAS to read XML as a “SAS Dataset”
• Can process an XML data file to create Map
• Works with subset of large file
• Not all elements appear for any particular key
• Tool has to guess data type of elements (like proc import CSV)
• Better to process an XSD file to create the Map
• Full definition (no “guessing” required)
• Not always available
• In easiest usage, will create keys to connect elements (“Ordinals”)
• The map is in XML format Gpx_map.txt
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
SAS XML Engine
Code to Access Data in XML
• Really very simple to use once Map is built:
filename test "/export/home/myid/gpx.xml";
filename SXLEMAP "/export/home/myid/gpx.map";
libname test xml xmlmap=SXLEMAP access=READONLY;
• And then use it much like any other SAS Dataset:
proc contents data=test._all_ ;
run;
• Or
data tableonly;
set test.member END=EOF;
output;
run;
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
SAS XML Engine
Code to access data in XML
• Modifying the XML file is more difficult (and not part of
this presentation):
ERROR: XMLMap= has been specified on the XML Libname
assignment. The output produced via this option will
change in upcoming releases. Correct the XML
Libname(remove XMLMap= option) and resubmit. Output
generation aborted.
• Everything you ever wanted to know about the SAS XML
engine is available at
• http://support.sas.com/rnd/base/xmlengine/index.html
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Our Problem
• Vendor provided XML file:
• Limited documentation
• No internal experience
• Short timeline
• Hundreds of internal “objects” (mapped to hundreds of SAS
datasets)
• Needed to be able to “see” data to learn about it
• Once in production:
• Daily input file
• Concatenated output Datasets
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Every object in the SAS mapped XML file will have an
Ordinal to ensure uniqueness (gpx):
• Child objects contain their parent Ordinals (rte):
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Child objects contain their parent Ordinals (rte):
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Those Ordinals are only unique to a specific XML data file, not over
time.
• In order to append today’s data to yesterday’s the Ordinals need to
change.
• Simple solution:
• Find yesterday’s maximum Ordinal for each table
• Add it to today’s values
• Append to yesterday’s accumulated file
• A better solution would be to build records in your desired format
• But you have to understand the data in order to do that
• Reduces flexibility
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• GPX example converts to 19 SAS datasets
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Dealing with Ordinals over Multiple Files
• Repeating the same code for 19 elements (XML pseudo SAS
Datasets) is a pain.
• Can you imagine doing it for hundreds?
• I’m lazy and I make mistakes.
• I’d really rather not copy & paste & edit the same code 19 (or 190)
times.
• I’d rather not have to repeat the process every time the file
changes (or a new element appears – no XSD)
• Since the same process applies for every one of the elements, why
not let code do the work for me?
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
Self-modifying Code
• The mechanism to create self-modifying code within SAS is rather
simple since it is an interpreted language. You use File, Put, and
%include:
filename sourcecd “gpxxml_generated&DATADATE..sas";
data _null_;
file sourcec2;
set temp.maxvals end=EOF;
/* use put statements */
run;
%include sourcec2;
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
Creating Code File
• The maxvals dataset contains the maximum ordinals from yesterday.
• I create a file that contains SAS code, and by including it, cause execution of that
new code. In detail:
filename sourcec2 "xml_generated&DATADATE._2.sas";
data _null_;
file sourcec2;
set temp.maxvals end=EOF;
if (_n_ = 1) then do; put "libname temp 'temp';"; end;
put "data temp." tableonly "; set " member "END=EOF;";
if prikey NOT = "" AND prival NOT = . then put prikey" = " prikey " + " prival";";
if parkey NOT = "" AND parval NOT = . then put parkey" = " parkey " + " parval";";
put "output;"; put "run;";
put "proc datasets; append base=output." tableonly " data=temp." tableonly "; run;";
if EOF then do;
put " run;";
end;
run;
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
The Generated Code
• The resulting code snippet defined in sourcec2:
/* Libname test and output defined in main program */
libname temp '/export/home/myid/temp';
data temp.AUTHOR ;
set test.AUTHOR END=EOF;
METADATA_ORDINAL = METADATA_ORDINAL + 257 ;
output;
run;
proc datasets;
append base=output.AUTHOR data=temp.AUTHOR ;
run;
data temp.BOUNDS ;
set test.BOUNDS END=EOF;
METADATA_ORDINAL = METADATA_ORDINAL + 257 ;
output;
run;
proc datasets;
append base=output.BOUNDS data=temp.BOUNDS ;
run;
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
• I also could have created a full program and
executed with a new sas command.
• This code is executed by including it back into the
main program.
• Each day, the generated code is a little different
because the maximum ordinals (in the example
above, the value 257) changes each day.
• That way, the history contains unique ordinals over
time.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Afterthoughts
• Used dynamic code rather than merging the maximum ordinal into
each record or macro – fewer passes through the data
• Encountered issues with the raw data – had to deal with invalid
tags that broke the XML engine.
• One disadvantage of the engine is the need to parse the entire file
for each object (200 objects, 200 passes through the file)
• We resorted to rsubmit to process 10 sets in parallel
• Seriously impacted I/O capabilities of system
• It would have been better to save off the maximum ordinal at each
stage rather than getting it from history.
• Had concerns about recovery and building new objects
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Afterthoughts
• I used proc contents; now that I know more about them
I could have used the dictionary tables instead
• Although these examples were written with version 1 of
the XML engine, when we moved to version 2, the only
code change (besides the map) was the following line:
libname test xmlv2 xmlmap=SXLEMAP access=readonly;
• Set nobs= does not work with XML.
• It compiles (no warning/error) and executes but returns the value
zero.
• I learned this while researching a presentation for PhilaSUG.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Questions
and
Answers
?! ?!
?!
?!
?
? ?
?
!
!
!
!
Wrap Up
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Using SAS Code to Generate SAS Code
Example Files
• Walking through the code:
• We can also look at the log:
• The generated SAS code (in case anyone cares):
xml_process_generator.txt
xml_process_generator.txt
gpxxml_generated20120404.txt gpxxml_generated20120404_2.txt
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Contact Information
The Speaker can be contacted at:
David B. Horvath, CCP
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: dhorvath@cobs.com
Web: http://www.cobs.com/
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF
Presenter
David B. Horvath, CCP, MS
David is an IT Professional who has worked with various platforms since the
1980’s with a variety of development and analysis tools.
He has presented sessions at PhilaSUG and SESUG previously as well as
presenting workshops and seminars in Australia, France, the US, Canada,
and Oxford England (about the British Author Nevil Shute) for various
organizations.
His undergraduate degree is in Computer and Information Sciences from
Temple University and a Masters in Organizational Dynamics from UPENN.
He achieved the Certified Computing Professional designation with honors.
Most of his career has been in consulting (although recently he has been in-
house) in the Philadelphia PA area.
He has several books to his credit (none directly SAS related) and is an
Adjunct Instructor at the University of Phoenix covering IT topics. He is
currently working in Data Analytics Infrastructure for a regional bank.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
#SASGF#SASGF
Your feedback counts!
Don't forget to complete the session survey
in your conference mobile app.
1. Go to the Agenda icon in the conference app.
2. Find this session title and select it.
3. On the sessions page, scroll down to Surveys
and select the name of the survey.
4. Complete the survey and click Finish.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

More Related Content

PPTX
Data storage format in hdfs
PPT
20140612 phila sug proc import
PPTX
SAS on Kubernetes
PPTX
SAS AX 2018 - Manufacturing Insights by William Nadolski
PPT
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
PDF
Consolidate your SAP System landscape Teched && d-code 2014
PPTX
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
PDF
Sas Talk To R Users Group
Data storage format in hdfs
20140612 phila sug proc import
SAS on Kubernetes
SAS AX 2018 - Manufacturing Insights by William Nadolski
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
Consolidate your SAP System landscape Teched && d-code 2014
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
Sas Talk To R Users Group

Similar to 20180410 sasgf2018 2454 lazy programmers xml ppt (20)

PPTX
Seven Agile Methods that Help Deliver Visualizations Agilely
PDF
20190615 hkos-mysql-troubleshootingandperformancev2
PPTX
PPTX
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
DOCX
SAS Programming Notes
PDF
Rename with Confidence – Building Dynamic FileMaker Systems
PDF
B1 intercompany sizing guide
PDF
How Can I Save Time and Build Trust With My Data Preparation.pdf
PPTX
Rapid Model Refresh (RMR) in Online Fraud Detection Engine
PDF
Adobe Flex Development and Backend Integration With BlazeDS
PDF
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
PDF
Sas 913 Language Reference Dictionary 5th Ed Sas Publishing
PDF
Grc 300 Sap Sccess Control Implementation And Configuration Participant Handb...
PDF
Analytics with SAS
PDF
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
PDF
Postgres.foreign.data.wrappers.2015
 
PPT
Whats Hot, Whats Not Skills For Sas® Professionals (Presentation)
PPTX
Data Regions: Modernizing your company's data ecosystem
PDF
SAP HANA SPS09 - XS Programming Model
Seven Agile Methods that Help Deliver Visualizations Agilely
20190615 hkos-mysql-troubleshootingandperformancev2
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
SAS Programming Notes
Rename with Confidence – Building Dynamic FileMaker Systems
B1 intercompany sizing guide
How Can I Save Time and Build Trust With My Data Preparation.pdf
Rapid Model Refresh (RMR) in Online Fraud Detection Engine
Adobe Flex Development and Backend Integration With BlazeDS
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Sas 913 Language Reference Dictionary 5th Ed Sas Publishing
Grc 300 Sap Sccess Control Implementation And Configuration Participant Handb...
Analytics with SAS
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
Postgres.foreign.data.wrappers.2015
 
Whats Hot, Whats Not Skills For Sas® Professionals (Presentation)
Data Regions: Modernizing your company's data ecosystem
SAP HANA SPS09 - XS Programming Model
Ad

More from David Horvath (11)

PDF
Introduction to Linux with Focus on Raspberry Pi
PDF
20190413 zen and the art of programming
PDF
(SAS) UNIX X Command Tips and Tricks
PPTX
20180414 nevil shute no highway modern metal fatigue
PDF
20180324 leveraging unix tools
PDF
20180324 zen and the art of programming
PDF
20171106 sesug bb 184 zen and the art of problem solving
PDF
20171106 sesug bb 180 proc import ppt
PPT
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
PPT
20150312 NOBS for Noobs
PPTX
20170419 To COMPRESS or Not, to COMPRESS or ZIP
Introduction to Linux with Focus on Raspberry Pi
20190413 zen and the art of programming
(SAS) UNIX X Command Tips and Tricks
20180414 nevil shute no highway modern metal fatigue
20180324 leveraging unix tools
20180324 zen and the art of programming
20171106 sesug bb 184 zen and the art of problem solving
20171106 sesug bb 180 proc import ppt
20150904 "A Few Words About 'In The Wet' by Nevil Shute"
20150312 NOBS for Noobs
20170419 To COMPRESS or Not, to COMPRESS or ZIP
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Machine Learning_overview_presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Machine Learning_overview_presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools

20180410 sasgf2018 2454 lazy programmers xml ppt

  • 1. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
  • 2. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. SGF2018 2454: Lazy Programmers Write Self-Modifying Code Or Dealing with XML Ordinals
  • 3. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Contact Information Copyright © 2012-2018 David B. Horvath, CCP — All Rights Reserved The Speaker can be contacted at: 504 Longbotham Drive, Aston PA 19014-2502, USA Phone: 1-610-859-8826 Email: dhorvath@cobs.com Web: http://www.cobs.com/ All trademarks and servicemarks are the property of their respective owners.
  • 4. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF XML - Background • XML stands for eXtensible Markup Language • Originally created in 1996 • Consists of Markup and Content • Markup defines the items (fields, etc.) – represented with tags • Content is the data • Is transportable and human readable • If well formed, you’ll have a definition (XSD – XML Schema Definition) • If not well formed, you’ll only have the XML data file itself • Easy for data provider to change layout: update XSD, add data to XML file • An easy way to think of XML is “CSV on steroids” • Very flexible: Advantage and Disadvantage
  • 5. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF XML XML File Sample <?xml version="1.0" encoding="UTF-8" standalone="no" ?><gpx xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v2" creator="nüvi 2370" version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/TrackPointExtensionv2.xsd"><metadata><l ink href="http://www.garmin.com"><text>Garmin International</text></link><time>2012-04- 12T04:31:39Z</time></metadata><wpt lat="40.247249" lon="- 75.513001"><ele>28.72</ele><name>002</name><sym>Waypoint</sym></wpt><wpt lat="39.764033" lon="-75.551346"><ele>61.17</ele> • Not very helpful viewed that way
  • 6. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF XML XML File Sample
  • 7. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF XML XML File Sample • Treating the file as text may be helpful • Elements like <trk> can have repeating sub-elements like <trkseg> • But we don’t know the “data model” • I’m using examples from my Garmin GPS • I don’t have to sanitize the data like I would with a file from work… • I’m not going to teach you XML coding today • http://en.wikipedia.org/wiki/Xml provides good background gpx_small_xml.txt
  • 8. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF XML XSD File Sample • Treating the file as text may be helpful • Describes exactly what is expected in the XML data file Gpx_xsd.txt
  • 9. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF SAS XML Mapping Tool • Free download from http://support.sas.com/kb/33/584.html • Creates a Map for SAS to read XML as a “SAS Dataset” • Can process an XML data file to create Map • Works with subset of large file • Not all elements appear for any particular key • Tool has to guess data type of elements (like proc import CSV) • Better to process an XSD file to create the Map • Full definition (no “guessing” required) • Not always available • In easiest usage, will create keys to connect elements (“Ordinals”) • The map is in XML format Gpx_map.txt
  • 10. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF SAS XML Engine Code to Access Data in XML • Really very simple to use once Map is built: filename test "/export/home/myid/gpx.xml"; filename SXLEMAP "/export/home/myid/gpx.map"; libname test xml xmlmap=SXLEMAP access=READONLY; • And then use it much like any other SAS Dataset: proc contents data=test._all_ ; run; • Or data tableonly; set test.member END=EOF; output; run;
  • 11. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF SAS XML Engine Code to access data in XML • Modifying the XML file is more difficult (and not part of this presentation): ERROR: XMLMap= has been specified on the XML Libname assignment. The output produced via this option will change in upcoming releases. Correct the XML Libname(remove XMLMap= option) and resubmit. Output generation aborted. • Everything you ever wanted to know about the SAS XML engine is available at • http://support.sas.com/rnd/base/xmlengine/index.html
  • 12. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Our Problem • Vendor provided XML file: • Limited documentation • No internal experience • Short timeline • Hundreds of internal “objects” (mapped to hundreds of SAS datasets) • Needed to be able to “see” data to learn about it • Once in production: • Daily input file • Concatenated output Datasets
  • 13. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Dealing with Ordinals over Multiple Files • Every object in the SAS mapped XML file will have an Ordinal to ensure uniqueness (gpx): • Child objects contain their parent Ordinals (rte):
  • 14. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Dealing with Ordinals over Multiple Files • Child objects contain their parent Ordinals (rte):
  • 15. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Dealing with Ordinals over Multiple Files • Those Ordinals are only unique to a specific XML data file, not over time. • In order to append today’s data to yesterday’s the Ordinals need to change. • Simple solution: • Find yesterday’s maximum Ordinal for each table • Add it to today’s values • Append to yesterday’s accumulated file • A better solution would be to build records in your desired format • But you have to understand the data in order to do that • Reduces flexibility
  • 16. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Dealing with Ordinals over Multiple Files • GPX example converts to 19 SAS datasets
  • 17. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Dealing with Ordinals over Multiple Files • Repeating the same code for 19 elements (XML pseudo SAS Datasets) is a pain. • Can you imagine doing it for hundreds? • I’m lazy and I make mistakes. • I’d really rather not copy & paste & edit the same code 19 (or 190) times. • I’d rather not have to repeat the process every time the file changes (or a new element appears – no XSD) • Since the same process applies for every one of the elements, why not let code do the work for me?
  • 18. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Using SAS Code to Generate SAS Code Self-modifying Code • The mechanism to create self-modifying code within SAS is rather simple since it is an interpreted language. You use File, Put, and %include: filename sourcecd “gpxxml_generated&DATADATE..sas"; data _null_; file sourcec2; set temp.maxvals end=EOF; /* use put statements */ run; %include sourcec2;
  • 19. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Using SAS Code to Generate SAS Code Creating Code File • The maxvals dataset contains the maximum ordinals from yesterday. • I create a file that contains SAS code, and by including it, cause execution of that new code. In detail: filename sourcec2 "xml_generated&DATADATE._2.sas"; data _null_; file sourcec2; set temp.maxvals end=EOF; if (_n_ = 1) then do; put "libname temp 'temp';"; end; put "data temp." tableonly "; set " member "END=EOF;"; if prikey NOT = "" AND prival NOT = . then put prikey" = " prikey " + " prival";"; if parkey NOT = "" AND parval NOT = . then put parkey" = " parkey " + " parval";"; put "output;"; put "run;"; put "proc datasets; append base=output." tableonly " data=temp." tableonly "; run;"; if EOF then do; put " run;"; end; run;
  • 20. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Using SAS Code to Generate SAS Code The Generated Code • The resulting code snippet defined in sourcec2: /* Libname test and output defined in main program */ libname temp '/export/home/myid/temp'; data temp.AUTHOR ; set test.AUTHOR END=EOF; METADATA_ORDINAL = METADATA_ORDINAL + 257 ; output; run; proc datasets; append base=output.AUTHOR data=temp.AUTHOR ; run; data temp.BOUNDS ; set test.BOUNDS END=EOF; METADATA_ORDINAL = METADATA_ORDINAL + 257 ; output; run; proc datasets; append base=output.BOUNDS data=temp.BOUNDS ; run;
  • 21. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Using SAS Code to Generate SAS Code • I also could have created a full program and executed with a new sas command. • This code is executed by including it back into the main program. • Each day, the generated code is a little different because the maximum ordinals (in the example above, the value 257) changes each day. • That way, the history contains unique ordinals over time.
  • 22. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Afterthoughts • Used dynamic code rather than merging the maximum ordinal into each record or macro – fewer passes through the data • Encountered issues with the raw data – had to deal with invalid tags that broke the XML engine. • One disadvantage of the engine is the need to parse the entire file for each object (200 objects, 200 passes through the file) • We resorted to rsubmit to process 10 sets in parallel • Seriously impacted I/O capabilities of system • It would have been better to save off the maximum ordinal at each stage rather than getting it from history. • Had concerns about recovery and building new objects
  • 23. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Afterthoughts • I used proc contents; now that I know more about them I could have used the dictionary tables instead • Although these examples were written with version 1 of the XML engine, when we moved to version 2, the only code change (besides the map) was the following line: libname test xmlv2 xmlmap=SXLEMAP access=readonly; • Set nobs= does not work with XML. • It compiles (no warning/error) and executes but returns the value zero. • I learned this while researching a presentation for PhilaSUG.
  • 24. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Questions and Answers ?! ?! ?! ?! ? ? ? ? ! ! ! ! Wrap Up
  • 25. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Using SAS Code to Generate SAS Code Example Files • Walking through the code: • We can also look at the log: • The generated SAS code (in case anyone cares): xml_process_generator.txt xml_process_generator.txt gpxxml_generated20120404.txt gpxxml_generated20120404_2.txt
  • 26. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Contact Information The Speaker can be contacted at: David B. Horvath, CCP 504 Longbotham Drive, Aston PA 19014-2502, USA Phone: 1-610-859-8826 Email: dhorvath@cobs.com Web: http://www.cobs.com/
  • 27. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF Presenter David B. Horvath, CCP, MS David is an IT Professional who has worked with various platforms since the 1980’s with a variety of development and analysis tools. He has presented sessions at PhilaSUG and SESUG previously as well as presenting workshops and seminars in Australia, France, the US, Canada, and Oxford England (about the British Author Nevil Shute) for various organizations. His undergraduate degree is in Computer and Information Sciences from Temple University and a Masters in Organizational Dynamics from UPENN. He achieved the Certified Computing Professional designation with honors. Most of his career has been in consulting (although recently he has been in- house) in the Philadelphia PA area. He has several books to his credit (none directly SAS related) and is an Adjunct Instructor at the University of Phoenix covering IT topics. He is currently working in Data Analytics Infrastructure for a regional bank.
  • 28. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. #SASGF#SASGF Your feedback counts! Don't forget to complete the session survey in your conference mobile app. 1. Go to the Agenda icon in the conference app. 2. Find this session title and select it. 3. On the sessions page, scroll down to Surveys and select the name of the survey. 4. Complete the survey and click Finish.
  • 29. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.