SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 893
Public Data Analysis and Utilization
Duckki Lee
Assistant Professor, Department of Smart Software, Yonam Institute of Technology, Jinju, South Korea
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - The Fourth Industrial Revolution heralds the
advent of the data era. Companies that retain and use
enormous amounts of data are at the forefront of market
innovation, and artificial intelligence and robotics, which
are fast-growing in all aspects of the nation and society, are
also data-driven. Following this trend, advanced nations,
particularly the United States, realize the critical role of
data in determining future competitiveness and are
revitalizing the data sector and making public data more
accessible. This paper analyses domestic and international
trends in public data and discusses the openness and use of
domestic public data. Following that, while developing
commercial services that make use of public data, the issues
and concerns associated with public data are highlighted.
Key Words: Public Data, Open Government Data, Public
Data Analysis, Public Data Utilization, Commercial Services
using Public Data
1.INTRODUCTION
The Fourth Industrial Revolution heralds the advent of the
data era. The age of oil and coal ushered in the first
industrial revolution, which was followed by the age of
electricity and communication, and then the age of
information technology, which ushers in the age of data.
The 4th Industrial Revolution is driven by intelligent
information technology, which is at the heart of data as a
fundamental component for intelligence, automation, and
autonomy.
The era we are entering is one of a data economy [1], in
which data is a critical resource in addition to land, labor,
and capital, and a data society [2, 3], in which all aspects of
everyday life are data-driven. Companies that retain and
use enormous amounts of data are at the forefront of
market innovation, and artificial intelligence and robotics,
which are fast-growing in all aspects of the nation and
society, are also data-driven. Following this trend,
advanced Western nations, notably the US, acknowledge
the critical role of data in determining future
competitiveness and are actively pursuing data hegemony
via strategies and increased investment in the data
industry [4-7]. Additionally, these nations are striving not
only to enhance economic value through the opening of
public data, but also to create social value through the
active use of data to address pressing issues facing the
country and society, such as transportation, the
environment, health, hygiene, disaster preparedness, and
safety. In keeping with this trend, the Korean government
likewise pursues a state-of-the-art intelligent government
for the twenty-first century and promotes the data
economy. Additionally, to encourage the realization of
social values through the use of data, legislation, system
development, and portal site establishment are being
vigorously supported. The accessibility of public data is
required to expand people's access to it and to promote
value creation via data use in the data economy and social
activities. To ensure the success of the public data opening
policy, the openness of public data is insufficient; instead,
a data ecosystem in which it can be disseminated,
exploited, and circulated must be established [8]. South
Korea has been operating a public data portal[24] since
2013 to accomplish this. By making data uploaded by all
central governments and local governments freely
available to all customers, including corporations and
ordinary residents, the public data portal plays a critical
role in the distribution and usage of data. Public data
portals have been shown to have substantial outcomes in
evaluations. Public data use services continue to
proliferate, and in the OECD's OUR Data Index, a national
assessment of public data openness, Korea topped the list
three consecutive years in 2015, 2017, and 2019[9, 22].
However, there is a variety of criticism on public data
portals. The quantity of data is vast, yet the essential data
cannot be found [10], the format of the data is very
inconvenient to use, or the data is extremely difficult to
integrate owing to the different input or file formats
between the data [11].
This paper analyses domestic and international trends in
public data and discusses the openness and use of
domestic public data. Following that, while developing
commercial services that make use of public data, the
issues and concerns associated with public data are
highlighted.
2. Trends in Domestic and International Public Data
The term "public data" refers to information and data
generated or authorized by the government. It is data that
is freely supplied, reused, and distributed to anyone, and
that users can use to produce their creations [12]. The
World Wide Web Foundation defines public data as data
that is publicly accessible online, reusable, and machine-
readable, and that enables huge volumes of data to be
downloaded and used as a single dataset for free [13].
According to Article 2 of Korea's Public Data Act[14],
public data refers to data or information processed
optically or electronically by public institutions for the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 894
objectives specified by applicable laws and regulations.
Public data is the data generated and maintained by
governments and public institutions to achieve public
objectives, such as conducting business and providing
public services and is a critical resource with enormous
potential value. Public data encompasses all information
on all residents, including their resident registration,
income, property, medical treatment, tax payment, and
real estate, as well as information about the weather,
transportation, logistics, energy, and water and sewage
systems that affect their everyday life.
US
The United States established data.gov[15], a unified
public data opening portal, in 2009, as part of an open
government initiative that incorporated the principle of
opening government-held public data. As of February
2022, data.gov is exposing data from 48 state
governments, 48 cities, and 152 government-affiliated
institutions, starting with the release of 76 data sets held
by 11 government agencies[15]. Additionally, until
recently, active data disclosure was ongoing, with 342,000
data sets in a variety of domains, including health, labor,
education, transportation, and crime. The data is primarily
available in the form of raw data sets, geodata sets,
interactive data, source code, APIs and programs, and
applications, and is updated monthly. Various data
formats, such as xls, cvs, and txt forms, are provided to
allow for the reuse of information, and RDF-type
conversion is straightforward. In addition, various
mashups may be reclassified, adjusted, and integrated
with other datasets within the system, making the system
a powerful tool.
The basis of data.gov is the 'Open Government Platform'
(OGPL). In 2013, Data.gov 2.0 was introduced as an open-
source data platform called CKAN[16]. As a consequence,
data catalogue services have been established to connect
open government data sites throughout the United States
with those of other nations, states, and cities. The public
data portal (data.gov) focuses on the integration and
management of metadata through the utilization of CKAN.
Additionally, public data is visualized using maps, and
DATA USA[17] is available separately for comparing and
analyzing major US cities.
Fig -1: US DATA.GOV and DATAUSA
UK
The United Kingdom has opened 51,957 datasets across
14 sectors, including economy, education, environment,
defense, health, and transportation, starting with
data.gov.uk[18], a public data portal established with the
participation of Tim Berners-Lee, the inventor of the web
and linked data. Since 2010, the UK has actively promoted
an open data policy and operated an open data portal,
data.gov.uk, which enables search and access to practically
all public sector data. The UK government's open data
policy intends to make information more accessible to the
public and advance the public interest through the use of
open data for policy research[19].
The British government established the Open Data
Institute (ODI)[20] in November 2012 as a non-profit
organization dedicated to using public data to identify new
enterprises and startups. The British government is
accomplishing this by providing support and developing
talent for venture firms that are attempting to build new
businesses via the development of technology and services
connected to public data. Approximately 30 startups have
been formed as a result of ODI's startup support programs.
Opencorporates[21], a market leader, is the owner of the
world's biggest corporate disclosure database, which has
190 million corporate records. The firm has received
positive feedback from civic organizations and investors
seeking corporate monitoring through the provision of
data through a search portal.
Fig -2: UK DATA.GOV.UK and opencoporates
South Korea
With the enactment of public data laws in 2013[14], South
Korea has been actively promoting an open data policy. To
promote the openness and use of public data in South
Korea, data.go.kr[24] was founded and is being managed
as a public data portal that delivers integrated
information. The public data portal makes public data
available in a variety of formats, including file data, open
APIs, and visualization, to enable anyone to simply and
comfortably utilize it, and to enable anybody to quickly
and correctly find desired public data through an easy and
convenient search. A data industry revival strategy headed
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 895
by data-related governmental agencies has been
established and is being promoted in South Korea. As a
consequence, South Korea topped the OUR Data Index for
three consecutive years in 2015, 2017 and 2019, an OECD-
mandated assessment of the amount of openness of public
data [9, 22]. Additionally, Korea was ranked 17th in 2014,
8th in 2015, and 5th in 2016 in the WWW Foundation's
Open Data Barometer (ODB)[13].
Currently, 68,865 public data sets are accessible through
public data portals. These data include 51,110 file-based
data, 9,020 open APIs, and 8,735 standard data; the
amount of data for each field is shown in Table-1, and the
national data map illustrating the proportion of data for
each field is presented in Figure-1.
Table -1: The number of Datasets by Topics
Topics Data
Sets
Topics Data
Sets
Education 3925 Health Care 3719
Land 4270 Disaster
Recovery
3512
Administration 9090 Transportation 5647
Finance 5027 Weather 5096
Industry 6473 Technology 2066
Social Services 4132 Agriculture 4201
Food 1997 Unification 964
Culture 8333 Law 413
Fig -3: Korea National Data Map
3. Public Data Provision and Utilization Analysis in
South Korea
In South Korea, the overall number of open data cases
climbed by 12.8 times, from 5,272 in 2013 to 24,588 in
2017, 28,400 in 2018, 33,600 in 2019, 55,139 in 2020, and
67,441 in 2021.
Chart -1: Public Data Provision Statistics
The number of public data openings is fast expanding as a
result of the Korean government's efforts to open public
data, and the number of public data uses is also
exponentially increasing as more businesses utilize data.
The number of public data users climbed from 13,923 in
2013 to 3,871,984 in 2017, 7,549,179 in 2018, 13,141,413
in 2019, 20,848,555 in 2020, and 33,340,436 in 2021. This
exponential growth represents a 2,394-fold increase in
comparison to the initial figure.
Chart -2: Public Data Utilization Statistics
Despite these efforts and accomplishments, the public data
portal has come under criticism for a variety of reasons.
The quantity of data is vast, yet the essential data cannot
be found [10], the format of the data is very inconvenient
to use, or the data is extremely difficult to integrate owing
to the different input or file formats between the data [11].
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 896
4. Considerations for Designing Public Data-based
Commercial Services
This chapter covers the considerations that must be made
when developing commercial services that make use of
public data included on public data portals.
4.1. How to Access Public Data
Currently, there are two general methods for using public
data.
The method using the Open API
It is a sort of service mashup that has been in the limelight
since the Web 2.0 era and has the advantage of not
requiring an individual or corporation to make a separate
database to build a service based on the given data.
Currently, a variety of public data sources are available in
XML and JSON formats, and the majority of service APIs
released in the last few years are supported in JSON
format for increased transmission speed and processing
efficiency.
The following issues arise while developing commercial
services utilizing the open API.
- Even if the cost per call is free, individual approval for
each API must be acquired.
- If it is necessary to freeze data at a certain moment in
time, unintended new data may be utilized or old and new
data may be combined if version control of the open API is
not correctly implemented.
Since the service to be provided is now reliant on the
service quality of the open API, it becomes more difficult
to regulate service quality. In comparison to other
drawbacks, one of the greatest impediments to creating
commercial services is the difficulty in managing service
quality owing to failure. One frequently used method for
resolving this issue is to cache the open API call results
made by the service in a memory database, etc. In this
case, performance and stability may be assured like that of
constructing its database for regularly occurring API call
results.
In general, if cost reduction is not the main objective while
establishing commercial services, it is preferable to
construct its database based on the data rather than
relying on public data APIs.
The method using file data
If the quantity of data is sufficient to distribute as a file,
then each piece of data may be distributed as a file. In this
case, the advantages of the API-based method are applied
as disadvantages, while the shortcomings of the API-based
method are implemented as advantages.
One of the main advantages is that it is possible to create a
database with the same content as the data available
through the open API. This enables simple management of
service stability since it is not dependent on the stability of
external services.
When data is distributed in the form of a file, it is often in
the form of a compressed MS Office Excel or CSV file. CSV
files are often utilized when dealing with data in the
context of creating commercial services since the data is
processed and turned into a database by machines rather
than humans.
4.2. Considerations for Designing Public Data-based
Commerical Services
In this section, we will discuss the factors to consider
while building a service using the CSV-type file data that
was selected for commercial service development owing
to its simplicity of use.
The following issues are significant since many public data
have trouble inputting or refining data with manpower.
Because errors are more likely to be discovered during the
service stage of unstructured data processing than during
the processing stage, it is frequently required to consider
them during the data processing stage.
Error in the CSV file itself
While the CSV file's properties make it simple to parse,
errors in the information structure are common because
data fields are divided by a separator such as a comma.
For example:
- If the data included in a field contains a separator as
content, even if it is distinguishable by a human cognitive
ability, a mechanical parser will have difficulty
distinguishing it, resulting in improper data processing or
errors during processing.
- Because field division is entirely dependent on the
existence or absence of a separator, a separator may be
accidentally added or omitted during the creation or
processing of a CSV file.
Data Encoding Issues
In contrast to other file formats, CSV files do not require a
specific character encoding scheme. As a result, without
additional information relating to the CSV file's metadata,
work such as inferring the CSV file's encoding information
is performed. If the encoding is not consistent or is
difficult to manage in a particular development
environment, pre-encoding conversion work should be
undertaken.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 897
Classification of Omissionable Information
Not all elements of public data have the same data schema.
Both the MS Office Excel file and the CSV file are
comprised of a sum of the data's schemas in this case. As a
result, some fields are left blank for each data element,
while others are filled with an empty value (i.e.,
represented by a continuous separator).
In the case of a NoSQL database that is well-suited for
unstructured data response, there is no significant issue
with storing data elements that lack certain fields.
However, in the case of SQL databases, it is required to
determine which fields are allowed to be null-set. Even
when a NoSQL database is used, the client must be aware
of the nullable field during the result parsing process.
However, because such information is excluded in the CSV
file, it is unavoidable to collect data by inspecting fields
that can be null-set when the data is processed.
Absence of Type Information
Each data field has a specific type, but this is not specified
explicitly. Even if the field is simply named "ID" or "code,"
if the field is a type of serial number (e.g., ID in the form of
year + month + day + high order), it is possible to keep this
type during processing for future database storage and
query.
Because there is no type of information for individual
fields at the moment, it is required to make assumptions
during the data processing process and infer the type by
examining if the assumption holds for all fields. If the type
is decided in this manner, it is vital to account for the
likelihood of problems in the future when data is updated
and assumptions are not established.
4.3. Public Data Verification
Due to the characteristics of XML and JSON, verifying
public data provided over an open API is simple. However,
due to a variety of previously identified issues, data
rticonstruction or batch processing is challenging.
Specifically, a malfunction is detected during the service
process, rather than during the batch processing process,
where a data error is found. Additionally, such an error is
extremely likely to be discovered only after the user
reports it. As a result, it is critical to detect errors in data
before the database's input or processing process.
5. CONCLUSIONS
Data is becoming an essential and critical factor of the
Fourth Industrial Revolution. Following this trend,
advanced nations such as the United States and the United
Kingdom recognize the value of data, develop diverse
strategies to rejuvenate the data industry, and repeatedly
make efforts to open public data. The paper investigated
trends in domestic and overseas public data, as well as the
opening and utilization of domestic public data.
Additionally, problems that arise while establishing
commercial services were covered, as well as points to
consider, using genuine public data. The consumers of
public data range from ordinary citizens to professionals,
and the purposes for which they are used vary
significantly, ranging from simple information retrieval to
commercial service development. As a result, it is difficult
to deliver data in a specific format or with specific content
for a single consumer. It is required to give a variety of
data in a variety of formats. Additionally, to improve the
use of public data in the development of actual commercial
services, it will be necessary to open public data that
reflects the issues and considerations presented in this
paper.
REFERENCES
[1] D. Newman, “How to Plan, Participate and Prosper in
the Data Economy”, Gartner, 2011,
https://www.gartner.com/en/documents/1610514/
how-to-plan-participate-and-prosper-in-the-data-
economy
[2] D. Reinsel, J. Gantz, J. Rydning, “Data Age 2025: The
Evolution of Data to Life-Critical. Don’t Focus on Big
Data; Focus on the Data That’s Big, An IDC White
Paper, Apr, 2017.
[3] D. Reinsel, J. Gantz, J. Rydning, “Data Age 2025: The
Digitization of the World. From Edge to Core, An IDC
White Paper, Nov, 2018.
[4] Joint Ministry of Relations, “2021 National Key Data
Open Plan, Public Data Strategy Committee, April
2021
[5] Joint Ministry of Relations, “2021 Public Data
Provision and Use Revitalization Implementation Plan
(draft), Public Data Strategy Committee, April 2021
[6] Joint Ministry of Relations, “Data Industry
Revitalization Strategy: I-KOREA 4.0 Data Sector Plan,
I-DATA, 4th Industrial Revolution Committee, 2018
[7] Joint Ministry of Relations, “Measures to revitalize the
data platform. - From system platform to user service
platform –“, 4th Industrial Revolution Committee, June
2021
[8] R. Pollock, “Building the (Open) Data Ecosystem”,
Open Knowledge International Blog, 2011
[9] OECD, “OECD Open, Useful and Re-usable
data(OURData) Index: 2019, Mar, 2020
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 898
[10] NIA, “Research on Legislative Improvement for Public
Data-Based Industrial Ecosystem Creation”, Research
Report of NIA, 2017
[11] Tae-Yeop Kim, “The Current State of Public Data
Opening Policies and Future Tasks”, Issues and Points,
No. 1455, April 2018, National Assembly Legislative
Research Office
[12] OECD, “Open Government Data Report: Enhancing
Policy Maturity for Sustainable Impact”, OECD, 2018
[13] World Wide Web Foundation,
https://webfoundation.org/
[14] ACT ON PROMOTION OF THE PROVISION AND USE OF
PUBLIC DATA,
https://elaw.klri.re.kr/eng_mobile/viewer.do?hseq=4
7133&type=part&key=4 https://www.data.gov/
[15] https://www.data.gov
[16] https://ckan.org/
[17] https://datausa.io/
[18] https://data.gov.uk/
[19] Great Britain, DBIS(Department for Business,
Innovation and Skills), “Seizing the data opportunity:
A strategy for UK data capability”, 2013
[20] https://theodi.org
[21] https://opencorporates.com/
[22] Open Government Data – OECD,
https://www.oecd.org/gov/digital-
government/open-government-data.htm
[23] M. Young, the Technical Writer’s Handbook. Mill
Valley, CA: University Science, 1989.
[24] https://www.data.go.kr/
BIOGRAPHIES
Duckki Lee is currently an
Assistant Professor in the
Department of Smart Software,
Yonam Institute of Technology in
South Korea. His research
interests include big data system,
public data analysis and
utilization.

More Related Content

PPTX
Open Data & Social Media: Recent Trends in e-Government
PDF
A Survey of (Potential) Open Data Ecosystem in India // ICEGOV // October 2014
PDF
US National Archives & Open Government Data
PPTX
Dimensions of Open Data Activities in Japan: Policy, Technology and Community
PDF
The open data economy unlocking economic value by opening government and publ...
PDF
The Politics of Open Data: Past, Present and Future
PPTX
130423egov opendata
PDF
FINAL REPORT
Open Data & Social Media: Recent Trends in e-Government
A Survey of (Potential) Open Data Ecosystem in India // ICEGOV // October 2014
US National Archives & Open Government Data
Dimensions of Open Data Activities in Japan: Policy, Technology and Community
The open data economy unlocking economic value by opening government and publ...
The Politics of Open Data: Past, Present and Future
130423egov opendata
FINAL REPORT

Similar to Public Data Analysis and Utilization (20)

PDF
Rebooting Public Service Delivery: How can open government data help to drive...
PDF
Conference THE FUTURE IS DATA Panel: Leaders of the European Open Data Maturi...
PDF
The Open Data Economy Unlocking Economic Value by Opening Government and Publ...
PDF
Overview of Open Data, Linked Data and Web Science
PDF
Digital Transparency and the Politics of Open Data
PDF
Fco open data in half day th-v2
PPTX
Ethiopian Open Government Data Initiative
PPTX
#opendata Back to the future
PDF
Fighting Phantom Firms in the UK: From Opening Up Datasets to Reshaping Data ...
PDF
Open Government Data, Linked Data, and the Missing Blocks in Korea
PDF
US EPA OSWER Linked Data Workshop 1-Feb-2013
PDF
Open Data how to
PPTX
Open data developments in Japan
PDF
Open Data Strategy & Portal of Korea Govt. - Munshil Choi
PDF
What is opendata
PDF
Open Government Data Review of Poland: Assessment and Proposals for Action
PDF
Opening Government Data in India // Slides from ODDC Network Meeting // Berli...
PPT
WCIT 2014 Andrew Stott - Implementing a successful government open data program
PDF
Open Data in Russia. Annual Report 2015
PPTX
Data "Of the People, By the People, For the People"
Rebooting Public Service Delivery: How can open government data help to drive...
Conference THE FUTURE IS DATA Panel: Leaders of the European Open Data Maturi...
The Open Data Economy Unlocking Economic Value by Opening Government and Publ...
Overview of Open Data, Linked Data and Web Science
Digital Transparency and the Politics of Open Data
Fco open data in half day th-v2
Ethiopian Open Government Data Initiative
#opendata Back to the future
Fighting Phantom Firms in the UK: From Opening Up Datasets to Reshaping Data ...
Open Government Data, Linked Data, and the Missing Blocks in Korea
US EPA OSWER Linked Data Workshop 1-Feb-2013
Open Data how to
Open data developments in Japan
Open Data Strategy & Portal of Korea Govt. - Munshil Choi
What is opendata
Open Government Data Review of Poland: Assessment and Proposals for Action
Opening Government Data in India // Slides from ODDC Network Meeting // Berli...
WCIT 2014 Andrew Stott - Implementing a successful government open data program
Open Data in Russia. Annual Report 2015
Data "Of the People, By the People, For the People"
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
Design Guidelines and solutions for Plastics parts
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
introduction to high performance computing
PPTX
Software Engineering and software moduleing
PDF
737-MAX_SRG.pdf student reference guides
PPTX
CyberSecurity Mobile and Wireless Devices
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
communication and presentation skills 01
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Design Guidelines and solutions for Plastics parts
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
August -2025_Top10 Read_Articles_ijait.pdf
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
introduction to high performance computing
Software Engineering and software moduleing
737-MAX_SRG.pdf student reference guides
CyberSecurity Mobile and Wireless Devices
Visual Aids for Exploratory Data Analysis.pdf
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
communication and presentation skills 01

Public Data Analysis and Utilization

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 893 Public Data Analysis and Utilization Duckki Lee Assistant Professor, Department of Smart Software, Yonam Institute of Technology, Jinju, South Korea ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - The Fourth Industrial Revolution heralds the advent of the data era. Companies that retain and use enormous amounts of data are at the forefront of market innovation, and artificial intelligence and robotics, which are fast-growing in all aspects of the nation and society, are also data-driven. Following this trend, advanced nations, particularly the United States, realize the critical role of data in determining future competitiveness and are revitalizing the data sector and making public data more accessible. This paper analyses domestic and international trends in public data and discusses the openness and use of domestic public data. Following that, while developing commercial services that make use of public data, the issues and concerns associated with public data are highlighted. Key Words: Public Data, Open Government Data, Public Data Analysis, Public Data Utilization, Commercial Services using Public Data 1.INTRODUCTION The Fourth Industrial Revolution heralds the advent of the data era. The age of oil and coal ushered in the first industrial revolution, which was followed by the age of electricity and communication, and then the age of information technology, which ushers in the age of data. The 4th Industrial Revolution is driven by intelligent information technology, which is at the heart of data as a fundamental component for intelligence, automation, and autonomy. The era we are entering is one of a data economy [1], in which data is a critical resource in addition to land, labor, and capital, and a data society [2, 3], in which all aspects of everyday life are data-driven. Companies that retain and use enormous amounts of data are at the forefront of market innovation, and artificial intelligence and robotics, which are fast-growing in all aspects of the nation and society, are also data-driven. Following this trend, advanced Western nations, notably the US, acknowledge the critical role of data in determining future competitiveness and are actively pursuing data hegemony via strategies and increased investment in the data industry [4-7]. Additionally, these nations are striving not only to enhance economic value through the opening of public data, but also to create social value through the active use of data to address pressing issues facing the country and society, such as transportation, the environment, health, hygiene, disaster preparedness, and safety. In keeping with this trend, the Korean government likewise pursues a state-of-the-art intelligent government for the twenty-first century and promotes the data economy. Additionally, to encourage the realization of social values through the use of data, legislation, system development, and portal site establishment are being vigorously supported. The accessibility of public data is required to expand people's access to it and to promote value creation via data use in the data economy and social activities. To ensure the success of the public data opening policy, the openness of public data is insufficient; instead, a data ecosystem in which it can be disseminated, exploited, and circulated must be established [8]. South Korea has been operating a public data portal[24] since 2013 to accomplish this. By making data uploaded by all central governments and local governments freely available to all customers, including corporations and ordinary residents, the public data portal plays a critical role in the distribution and usage of data. Public data portals have been shown to have substantial outcomes in evaluations. Public data use services continue to proliferate, and in the OECD's OUR Data Index, a national assessment of public data openness, Korea topped the list three consecutive years in 2015, 2017, and 2019[9, 22]. However, there is a variety of criticism on public data portals. The quantity of data is vast, yet the essential data cannot be found [10], the format of the data is very inconvenient to use, or the data is extremely difficult to integrate owing to the different input or file formats between the data [11]. This paper analyses domestic and international trends in public data and discusses the openness and use of domestic public data. Following that, while developing commercial services that make use of public data, the issues and concerns associated with public data are highlighted. 2. Trends in Domestic and International Public Data The term "public data" refers to information and data generated or authorized by the government. It is data that is freely supplied, reused, and distributed to anyone, and that users can use to produce their creations [12]. The World Wide Web Foundation defines public data as data that is publicly accessible online, reusable, and machine- readable, and that enables huge volumes of data to be downloaded and used as a single dataset for free [13]. According to Article 2 of Korea's Public Data Act[14], public data refers to data or information processed optically or electronically by public institutions for the
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 894 objectives specified by applicable laws and regulations. Public data is the data generated and maintained by governments and public institutions to achieve public objectives, such as conducting business and providing public services and is a critical resource with enormous potential value. Public data encompasses all information on all residents, including their resident registration, income, property, medical treatment, tax payment, and real estate, as well as information about the weather, transportation, logistics, energy, and water and sewage systems that affect their everyday life. US The United States established data.gov[15], a unified public data opening portal, in 2009, as part of an open government initiative that incorporated the principle of opening government-held public data. As of February 2022, data.gov is exposing data from 48 state governments, 48 cities, and 152 government-affiliated institutions, starting with the release of 76 data sets held by 11 government agencies[15]. Additionally, until recently, active data disclosure was ongoing, with 342,000 data sets in a variety of domains, including health, labor, education, transportation, and crime. The data is primarily available in the form of raw data sets, geodata sets, interactive data, source code, APIs and programs, and applications, and is updated monthly. Various data formats, such as xls, cvs, and txt forms, are provided to allow for the reuse of information, and RDF-type conversion is straightforward. In addition, various mashups may be reclassified, adjusted, and integrated with other datasets within the system, making the system a powerful tool. The basis of data.gov is the 'Open Government Platform' (OGPL). In 2013, Data.gov 2.0 was introduced as an open- source data platform called CKAN[16]. As a consequence, data catalogue services have been established to connect open government data sites throughout the United States with those of other nations, states, and cities. The public data portal (data.gov) focuses on the integration and management of metadata through the utilization of CKAN. Additionally, public data is visualized using maps, and DATA USA[17] is available separately for comparing and analyzing major US cities. Fig -1: US DATA.GOV and DATAUSA UK The United Kingdom has opened 51,957 datasets across 14 sectors, including economy, education, environment, defense, health, and transportation, starting with data.gov.uk[18], a public data portal established with the participation of Tim Berners-Lee, the inventor of the web and linked data. Since 2010, the UK has actively promoted an open data policy and operated an open data portal, data.gov.uk, which enables search and access to practically all public sector data. The UK government's open data policy intends to make information more accessible to the public and advance the public interest through the use of open data for policy research[19]. The British government established the Open Data Institute (ODI)[20] in November 2012 as a non-profit organization dedicated to using public data to identify new enterprises and startups. The British government is accomplishing this by providing support and developing talent for venture firms that are attempting to build new businesses via the development of technology and services connected to public data. Approximately 30 startups have been formed as a result of ODI's startup support programs. Opencorporates[21], a market leader, is the owner of the world's biggest corporate disclosure database, which has 190 million corporate records. The firm has received positive feedback from civic organizations and investors seeking corporate monitoring through the provision of data through a search portal. Fig -2: UK DATA.GOV.UK and opencoporates South Korea With the enactment of public data laws in 2013[14], South Korea has been actively promoting an open data policy. To promote the openness and use of public data in South Korea, data.go.kr[24] was founded and is being managed as a public data portal that delivers integrated information. The public data portal makes public data available in a variety of formats, including file data, open APIs, and visualization, to enable anyone to simply and comfortably utilize it, and to enable anybody to quickly and correctly find desired public data through an easy and convenient search. A data industry revival strategy headed
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 895 by data-related governmental agencies has been established and is being promoted in South Korea. As a consequence, South Korea topped the OUR Data Index for three consecutive years in 2015, 2017 and 2019, an OECD- mandated assessment of the amount of openness of public data [9, 22]. Additionally, Korea was ranked 17th in 2014, 8th in 2015, and 5th in 2016 in the WWW Foundation's Open Data Barometer (ODB)[13]. Currently, 68,865 public data sets are accessible through public data portals. These data include 51,110 file-based data, 9,020 open APIs, and 8,735 standard data; the amount of data for each field is shown in Table-1, and the national data map illustrating the proportion of data for each field is presented in Figure-1. Table -1: The number of Datasets by Topics Topics Data Sets Topics Data Sets Education 3925 Health Care 3719 Land 4270 Disaster Recovery 3512 Administration 9090 Transportation 5647 Finance 5027 Weather 5096 Industry 6473 Technology 2066 Social Services 4132 Agriculture 4201 Food 1997 Unification 964 Culture 8333 Law 413 Fig -3: Korea National Data Map 3. Public Data Provision and Utilization Analysis in South Korea In South Korea, the overall number of open data cases climbed by 12.8 times, from 5,272 in 2013 to 24,588 in 2017, 28,400 in 2018, 33,600 in 2019, 55,139 in 2020, and 67,441 in 2021. Chart -1: Public Data Provision Statistics The number of public data openings is fast expanding as a result of the Korean government's efforts to open public data, and the number of public data uses is also exponentially increasing as more businesses utilize data. The number of public data users climbed from 13,923 in 2013 to 3,871,984 in 2017, 7,549,179 in 2018, 13,141,413 in 2019, 20,848,555 in 2020, and 33,340,436 in 2021. This exponential growth represents a 2,394-fold increase in comparison to the initial figure. Chart -2: Public Data Utilization Statistics Despite these efforts and accomplishments, the public data portal has come under criticism for a variety of reasons. The quantity of data is vast, yet the essential data cannot be found [10], the format of the data is very inconvenient to use, or the data is extremely difficult to integrate owing to the different input or file formats between the data [11].
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 896 4. Considerations for Designing Public Data-based Commercial Services This chapter covers the considerations that must be made when developing commercial services that make use of public data included on public data portals. 4.1. How to Access Public Data Currently, there are two general methods for using public data. The method using the Open API It is a sort of service mashup that has been in the limelight since the Web 2.0 era and has the advantage of not requiring an individual or corporation to make a separate database to build a service based on the given data. Currently, a variety of public data sources are available in XML and JSON formats, and the majority of service APIs released in the last few years are supported in JSON format for increased transmission speed and processing efficiency. The following issues arise while developing commercial services utilizing the open API. - Even if the cost per call is free, individual approval for each API must be acquired. - If it is necessary to freeze data at a certain moment in time, unintended new data may be utilized or old and new data may be combined if version control of the open API is not correctly implemented. Since the service to be provided is now reliant on the service quality of the open API, it becomes more difficult to regulate service quality. In comparison to other drawbacks, one of the greatest impediments to creating commercial services is the difficulty in managing service quality owing to failure. One frequently used method for resolving this issue is to cache the open API call results made by the service in a memory database, etc. In this case, performance and stability may be assured like that of constructing its database for regularly occurring API call results. In general, if cost reduction is not the main objective while establishing commercial services, it is preferable to construct its database based on the data rather than relying on public data APIs. The method using file data If the quantity of data is sufficient to distribute as a file, then each piece of data may be distributed as a file. In this case, the advantages of the API-based method are applied as disadvantages, while the shortcomings of the API-based method are implemented as advantages. One of the main advantages is that it is possible to create a database with the same content as the data available through the open API. This enables simple management of service stability since it is not dependent on the stability of external services. When data is distributed in the form of a file, it is often in the form of a compressed MS Office Excel or CSV file. CSV files are often utilized when dealing with data in the context of creating commercial services since the data is processed and turned into a database by machines rather than humans. 4.2. Considerations for Designing Public Data-based Commerical Services In this section, we will discuss the factors to consider while building a service using the CSV-type file data that was selected for commercial service development owing to its simplicity of use. The following issues are significant since many public data have trouble inputting or refining data with manpower. Because errors are more likely to be discovered during the service stage of unstructured data processing than during the processing stage, it is frequently required to consider them during the data processing stage. Error in the CSV file itself While the CSV file's properties make it simple to parse, errors in the information structure are common because data fields are divided by a separator such as a comma. For example: - If the data included in a field contains a separator as content, even if it is distinguishable by a human cognitive ability, a mechanical parser will have difficulty distinguishing it, resulting in improper data processing or errors during processing. - Because field division is entirely dependent on the existence or absence of a separator, a separator may be accidentally added or omitted during the creation or processing of a CSV file. Data Encoding Issues In contrast to other file formats, CSV files do not require a specific character encoding scheme. As a result, without additional information relating to the CSV file's metadata, work such as inferring the CSV file's encoding information is performed. If the encoding is not consistent or is difficult to manage in a particular development environment, pre-encoding conversion work should be undertaken.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 897 Classification of Omissionable Information Not all elements of public data have the same data schema. Both the MS Office Excel file and the CSV file are comprised of a sum of the data's schemas in this case. As a result, some fields are left blank for each data element, while others are filled with an empty value (i.e., represented by a continuous separator). In the case of a NoSQL database that is well-suited for unstructured data response, there is no significant issue with storing data elements that lack certain fields. However, in the case of SQL databases, it is required to determine which fields are allowed to be null-set. Even when a NoSQL database is used, the client must be aware of the nullable field during the result parsing process. However, because such information is excluded in the CSV file, it is unavoidable to collect data by inspecting fields that can be null-set when the data is processed. Absence of Type Information Each data field has a specific type, but this is not specified explicitly. Even if the field is simply named "ID" or "code," if the field is a type of serial number (e.g., ID in the form of year + month + day + high order), it is possible to keep this type during processing for future database storage and query. Because there is no type of information for individual fields at the moment, it is required to make assumptions during the data processing process and infer the type by examining if the assumption holds for all fields. If the type is decided in this manner, it is vital to account for the likelihood of problems in the future when data is updated and assumptions are not established. 4.3. Public Data Verification Due to the characteristics of XML and JSON, verifying public data provided over an open API is simple. However, due to a variety of previously identified issues, data rticonstruction or batch processing is challenging. Specifically, a malfunction is detected during the service process, rather than during the batch processing process, where a data error is found. Additionally, such an error is extremely likely to be discovered only after the user reports it. As a result, it is critical to detect errors in data before the database's input or processing process. 5. CONCLUSIONS Data is becoming an essential and critical factor of the Fourth Industrial Revolution. Following this trend, advanced nations such as the United States and the United Kingdom recognize the value of data, develop diverse strategies to rejuvenate the data industry, and repeatedly make efforts to open public data. The paper investigated trends in domestic and overseas public data, as well as the opening and utilization of domestic public data. Additionally, problems that arise while establishing commercial services were covered, as well as points to consider, using genuine public data. The consumers of public data range from ordinary citizens to professionals, and the purposes for which they are used vary significantly, ranging from simple information retrieval to commercial service development. As a result, it is difficult to deliver data in a specific format or with specific content for a single consumer. It is required to give a variety of data in a variety of formats. Additionally, to improve the use of public data in the development of actual commercial services, it will be necessary to open public data that reflects the issues and considerations presented in this paper. REFERENCES [1] D. Newman, “How to Plan, Participate and Prosper in the Data Economy”, Gartner, 2011, https://www.gartner.com/en/documents/1610514/ how-to-plan-participate-and-prosper-in-the-data- economy [2] D. Reinsel, J. Gantz, J. Rydning, “Data Age 2025: The Evolution of Data to Life-Critical. Don’t Focus on Big Data; Focus on the Data That’s Big, An IDC White Paper, Apr, 2017. [3] D. Reinsel, J. Gantz, J. Rydning, “Data Age 2025: The Digitization of the World. From Edge to Core, An IDC White Paper, Nov, 2018. [4] Joint Ministry of Relations, “2021 National Key Data Open Plan, Public Data Strategy Committee, April 2021 [5] Joint Ministry of Relations, “2021 Public Data Provision and Use Revitalization Implementation Plan (draft), Public Data Strategy Committee, April 2021 [6] Joint Ministry of Relations, “Data Industry Revitalization Strategy: I-KOREA 4.0 Data Sector Plan, I-DATA, 4th Industrial Revolution Committee, 2018 [7] Joint Ministry of Relations, “Measures to revitalize the data platform. - From system platform to user service platform –“, 4th Industrial Revolution Committee, June 2021 [8] R. Pollock, “Building the (Open) Data Ecosystem”, Open Knowledge International Blog, 2011 [9] OECD, “OECD Open, Useful and Re-usable data(OURData) Index: 2019, Mar, 2020
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 898 [10] NIA, “Research on Legislative Improvement for Public Data-Based Industrial Ecosystem Creation”, Research Report of NIA, 2017 [11] Tae-Yeop Kim, “The Current State of Public Data Opening Policies and Future Tasks”, Issues and Points, No. 1455, April 2018, National Assembly Legislative Research Office [12] OECD, “Open Government Data Report: Enhancing Policy Maturity for Sustainable Impact”, OECD, 2018 [13] World Wide Web Foundation, https://webfoundation.org/ [14] ACT ON PROMOTION OF THE PROVISION AND USE OF PUBLIC DATA, https://elaw.klri.re.kr/eng_mobile/viewer.do?hseq=4 7133&type=part&key=4 https://www.data.gov/ [15] https://www.data.gov [16] https://ckan.org/ [17] https://datausa.io/ [18] https://data.gov.uk/ [19] Great Britain, DBIS(Department for Business, Innovation and Skills), “Seizing the data opportunity: A strategy for UK data capability”, 2013 [20] https://theodi.org [21] https://opencorporates.com/ [22] Open Government Data – OECD, https://www.oecd.org/gov/digital- government/open-government-data.htm [23] M. Young, the Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989. [24] https://www.data.go.kr/ BIOGRAPHIES Duckki Lee is currently an Assistant Professor in the Department of Smart Software, Yonam Institute of Technology in South Korea. His research interests include big data system, public data analysis and utilization.