SlideShare a Scribd company logo
Module 5
Dr. K V N LAKSHMI
What is data processing
• Data processing involves transforming raw data into useful
information
• Stages of data processing include collection, filtering,
sorting, and analysis
• Data processing relies on various tools and techniques to
ensure accurate, valuable output
Module 5.pptxData processing involves transforming raw data into useful information
Data collection
• The first stage of data collection involves gathering and
discovering raw data from various sources, such as sensors,
databases, or customer surveys. It is essential to ensure the
collected data is accurate, complete, and relevant to the
analysis or processing goals. Care must be taken to avoid
selection bias, where the method of collecting data
inadvertently favors certain outcomes or groups, potentially
skewing results and leading to inaccurate conclusions.
Data preparation
• Once the data is collected, it moves to the data preparation
stage. Here, the raw data is cleaned up, organized, and
often enriched for further processing. This stage involves
checking for errors, removing any bad data (redundant,
incomplete, or incorrect), and enhancing the dataset with
additional relevant information from external sources, a
process known as data enrichment. Data preparation aims
to create high-quality, reliable, and comprehensive data for
subsequent processing steps.
Data input
• The next stage is data input. In this stage, the clean and
prepped data is fed into a processing system, which could
be software or an algorithm designed for specific data types
or analysis goals. Various methods, such as manual entry,
data import from external sources, or automatic data
capture, can be used to input data into the processing
system.
Data processing
• In the data processing stage, the input data is transformed,
analyzed, and organized to produce relevant information.
Several data processing techniques, like filtering, sorting,
aggregation, or classification, may be employed to process
the data. The choice of methods depends on the desired
outcome or insights from the data.
Data output and interpretation
• The data output and interpretation stage deals with
presenting the processed data in an easily digestible format.
This could involve generating reports, graphs, or
visualizations that simplify complex data patterns and help
with decision-making. Furthermore, the output data should
be interpreted and analyzed to extract valuable insights and
knowledge.X
Data storage
• Finally, in the data storage stage, the processed information
is securely stored in databases or data warehouses for
future retrieval, analysis, or use. Proper storage ensures
data longevity, availability, and accessibility while
maintaining data privacy and security.
Module 5.pptxData processing involves transforming raw data into useful information
Batch processing
• Batch processing involves handling large volumes of data
collectively at predetermined times, making it ideal for non-
time-sensitive tasks. This method allows organizations to
efficiently manage data by aggregating it and processing it
during off-peak hours to minimize the impact on daily
operations.
• Example: Financial institutions batch process checks and
transactions overnight, updating account balances in one
comprehensive sweep to ensure accuracy and efficiency.
Real-time processing
• Real-time processing is essential for tasks that require
immediate handling of data upon receipt, providing instant
processing and feedback. This type of processing is crucial
for applications where delays cannot be tolerated, ensuring
timely decisions and responses.
• Example: GPS navigation systems rely on real-time
processing to offer turn-by-turn directions, adjusting routes
based on live traffic and road conditions to ensure the
fastest path.
Multiprocessing (parallel processing)
• Multiprocessing, or parallel processing, involves utilizing
multiple processing units or CPUs to handle various tasks
simultaneously. This approach allows for more efficient data
processing, particularly for complex computations that can
be broken down into smaller, concurrent tasks, thereby
speeding up overall processing time.
• Example: Movie production often utilizes multiprocessing for
rendering complex 3D animations. By distributing the
rendering across multiple computers, the overall project's
completion time is significantly reduced, leading to faster
production cycles and improved visual quality.
Online processing
• Online processing facilitates the interactive processing of
data over a network, with continuous input and output for
instant responses. It enables systems to handle user
requests immediately, making it an essential component of
e-commerce and online services.
• Example: Online banking systems utilize online processing
for real-time financial transactions, allowing users to
transfer funds, pay bills, and check account balances with
immediate updates.
Manual data processing
• Manual data processing requires human intervention for the
input, processing, and output of data, typically without the
aid of electronic devices. This labor-intensive method is
prone to errors but was common before the advent of
computerized systems.
• Example: Before the widespread use of computers, libraries
cataloged books manually, requiring librarians to carefully
record each book's details by hand for inventory and
retrieval purposes.
Mechanical data processing
• Mechanical data processing uses machines or equipment to
manage and process data tasks, a prevalent method before
the digital era. This approach involved using tangible,
mechanical devices to input, process, and output data.
• Example: Voting in the early 20th century often involved
mechanical lever machines, where votes were tallied by
pulling levers for each choice, simplifying vote counting and
reducing the potential for errors.
Electronic data processing
• Electronic data processing employs computers and digital
technology to process, store, and communicate data with
efficiency and accuracy. This modern approach to data
handling allows for rapid processing speeds, vast storage
capabilities, and easy data retrieval.
• Example: Retailers use electronic data processing at
checkouts, where barcode scans instantly update inventory
systems and process sales, enhancing checkout speed and
inventory management.
Distributed processing
• Distributed processing involves spreading computational
tasks across multiple computers or devices to improve
processing speed and reliability. This method leverages the
collective power of various systems to handle large-scale
processing tasks more efficiently than could be achieved
with a single computer.
• Example: Video streaming services use distributed
processing to deliver content efficiently. By storing videos
on multiple servers, they ensure smooth playback and quick
access for users worldwide.
Cloud computing
• Cloud computing offers computing resources, such as
servers, storage, and databases, over the internet, providing
flexibility and scalability. This model enables users to access
and utilize computing resources as needed, without the
burden of maintaining physical infrastructure.
• Example: Small businesses leverage cloud computing for
data storage and software services, avoiding the need for
significant upfront hardware investments and allowing easy
scaling as the business grows.
Automatic data processing
• Automatic data processing uses software to automate
routine tasks, reducing the need for manual input and
increasing operational efficiency. This method streamlines
repetitive processes, minimizes human error, and frees up
personnel for more strategic tasks.
• Example: Automated billing systems in telecommunications
automatically calculate and send out monthly charges to
customers, streamlining billing operations and reducing
errors.
Data preparation
• Data preparation is the process of cleaning and
transforming raw data prior to processing and analysis. It is
an important step prior to processing and often involves
reformatting data, making corrections to data, and
combining datasets to enrich data.
• Data preparation is often a lengthy undertaking for data
engineers or business users, but it is essential as a
prerequisite to put data in context in order to turn it into
insights and eliminate bias resulting from poor data quality.
Benefits of data preparation in the
cloud
• Fix errors quickly — Data preparation helps catch errors before processing. After data has been
removed from its original source, these errors become more difficult to understand and correct.
• Produce top-quality data — Cleaning and reformatting datasets ensures that all data used in
analysis will be of high quality.
• Make better business decisions — Higher-quality data that can be processed and analyzed more
quickly and efficiently leads to more timely, efficient, better-quality business decisions.
• Additionally, as data and data processes move to the cloud, data preparation moves with it for even
greater benefits, such as:
• Superior scalability — Cloud data preparation can grow at the pace of the business. Enterprises
don’t have to worry about the underlying infrastructure or try to anticipate their evolutions.
• Future proof — Cloud data preparation upgrades automatically so that new capabilities or
problem fixes can be turned on as soon as they are released. This allows organizations to stay
ahead of the innovation curve without delays and added costs.
• Accelerated data usage and collaboration — Doing data prep in the cloud means it is always on,
doesn’t require any technical installation, and lets teams collaborate on the work for faster results.
Data preparation steps
questionnaire checking
• The data preparation process begins with finding the right data. This can
come from an existing data catalog or data sources can be added ad-hoc.
• Check whether questionnaire is acceptable or not.
• Complete or not
Data editing
• Data editing is the application of checks to detect missing, invalid or
inconsistent entries or to point to data records that are potentially in
error. No matter what type of data you are working with, certain edits are
performed at different stages or phases of data collection and processing.
• Detect errors and omissions
Data preparation steps
Data coding: Converting data into codes.
Process of assigning numerical values to responses that are
originally in a given format such as numerical, text, audio or
video. The main objective is to facilitate the automatic
treatment of data for analytical purposes.
Coded data can be analyzed using statistical software tools.
Data preparation steps
Data classification:
• Data classification is the practice of organizing and categorizing data elements according
to pre-defined criteria. Classification makes data easier to locate and retrieve. Classifying
data is instrumental in promoting risk management, security, and regulatory compliance.
• Steps for Effective Data Classification
• Understand the Current Setup: Taking a detailed look at the location of current data
and all regulations that pertain to your organization is perhaps the best starting point for
effectively classifying data. You must know what data you have before you can classify it.
• Creating a Data Classification Policy: Staying compliant with data protection principles
in an organization is nearly impossible without proper policy. Creating a policy should be
your top priority.
• Prioritize and Organize Data: Now that you have a policy and a picture of your current
data, it’s time to properly classify the data. Decide on the best way to tag your data based
on its sensitivity and privacy.
Data preparation steps:
Classification is of two types
According to attribute
Ex. Literacy rate
Honesty
Beauty
Weight
Height
According to class-interval
• Income
• Production
• Age
• Sometimes weight, height
Data preparation steps
Tabulation:
Tabulation is a method of presenting numeric data in rows
and columns in a logical and systematic manner to aid
comparison and statistical analysis. It allows for easier
comparison by putting relevant data closer together, and it
aids in statistical analysis and interpretation.
IMPORTANCE OF TABULATION
• Information or any statistics presented in a table should be alienated into different
dimensions and for each dimension should be clearly mention the grand totals and
sub totals to show the associations between different dimensions of data put in the
tabular form easy understand.
• (The preparations of any statistics should be arranged in a systematic manner with a
heading and proper numberings which simply helps the readers to recognize the
necessary responsibility to the research.
• Tabulation builds the data into concise form; as a result, it helps the reader to
understand easily. This data can also be presented in the form of graphs/charts/flow
charts/ diagrams.
• The data in tabular form can be shown in the numerical figures in an attention-
grabbing form.
• It makes difficult data into a simpler form and as a result it becomes easy to
categorize within the data.
IMPORTANCE OF TABULATION
• Tabulation type of the arrangement is helpful in knowing
the mistakes.
• Tables will be helpful in condensing the information and
makes easy to examine the contents.
• Tabulation is economical mode to put the current data and
helps to minimize the time and in turn researcher will able
perform the work effectively.
• Recently the formation of tabular information with the help
of gadgets easily summaries the large data which is
scattered in a systematic.
Tabulation
Data preparation steps
Graphical representation refers to the use of charts and
graphs to visually display, analyze, clarify, and interpret
numerical data, functions, and other qualitative structures.
Module 5.pptxData processing involves transforming raw data into useful information
Stem and leaf plot
• A stem and leaf plot is used to organize data as they are
collected. A stem and leaf plot looks something like a bar
graph. Each number in the data is broken down into a stem
and a leaf, thus the name.
• Ex: 15,27, 8,17, 13, 17,22, 24,25,14,13, 36,22,22,32, 32,28,7
• Ex. 72, 85,89, 93, 88, 109, 115, 97, 102, 113
• Ex. 1.2, 2.3, 1.5, 1.6, 1.8, 2.7, 3.2, 3.6, 4.5,7.8,7.1,10.6, 11.5
Module 5.pptxData processing involves transforming raw data into useful information
Data preparation steps
Data Cleaning: is the process of fixing or removing incorrect,
corrupted, incorrectly formatted, duplicate, or incomplete data
within a dataset
Deduplication
• Deduplication refers to a method of eliminating a dataset's
redundant data. In a secure data deduplication process, a
deduplication assessment tool identifies extra copies of data
and deletes them, so a single instance can then be stored.
Data deduplication software analyses data to identify
duplicate byte patterns.
What is ANOVA?
• ANOVA, or Analysis of Variance, is a test used to determine
differences between research results from three or more unrelated
samples or groups
• The key word in ‘Analysis of Variance’ is the last one. ‘Variance’
represents the degree to which numerical values of a particular
variable deviate from its overall mean. You could think of the
dispersion of those values plotted on a graph, with the average being
at the centre of that graph. The variance provides a measure of how
scattered the data points are from this central value.
• H0:There is no difference between the group means
The Chi squared tests
• What Is Goodness-of-Fit:
The term goodness-of-fit refers to a statistical test that determines how
well sample data fits a distribution from a population with a normal
distribution. Put simply, it hypothesizes whether a sample is skewed or
represents the data you would expect to find in the actual population.
H0:There is no difference between the group means
T-test
• A t-test is a statistical tool that compares the means of two groups or
the mean of a group to a standard value. It's also known as a
Student's t-test, t-statistic, or t-distribution
• H0:There is no difference between the group means
One-Sample Proportion Test
• The One-Sample Proportion Test is used to assess whether a
population proportion (P1) is significantly different from a
hypothesized value (P0). This is called the hypothesis of inequality.
• H0:There is no difference between the group means
Correlational test
• A correlational test, also known as correlation analysis, is a statistical
method that measures the strength and direction of the relationship
between two or more variables. The results of a correlational test are
summarized as a correlation coefficient, which is a number between -
1 and +1. The value of the coefficient indicates the strength of the
relationship, and the sign indicates the direction
Hypothesis:
• The effect of social media on mental well-being does not significantly
vary based on the frequency of its usage.
• ANOVA
• Average Daily Social Media Usage:
• Sum of Squares df Mean Square F Sig.
• Between Groups 52.737 4 13.184 14.688 .000
• Within Groups 82.582 92 .898
• Total 135.320 96
Hypothesis
• H0: The level of social media addiction does not differ significantly
between gender.
Chi-Square Tests
Value df
Asymp. Sig.
(2-sided)
Exact Sig. (2-
sided)
Exact Sig. (1-
sided)
Pearson Chi-Square .029a
1 .865
Continuity Correctionb
.000 1 1.000
Likelihood Ratio .029 1 .865
Fisher's Exact Test 1.000 .516
Linear-by-Linear
Association
.029 1 .866
N of Valid Cases 97
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
16.60.
b. Computed only for a 2x2 table
Hypothesis
• There is no relation between the type of accommodation and
spending more time on social media than intended.
Module 5.pptxData processing involves transforming raw data into useful information

More Related Content

PPTX
DATA PROCESSING pharmaceutics III by 3rd prof students
PDF
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
PDF
Lect 1b - Introduction to Pharmacy Informatics 1b.pdf
PDF
Computer Applications in Business Aarthy ppt.pdf
PPTX
Data Processing.pptx
PPTX
BA.pptx
PPTX
unit 1 big data.pptx
PDF
Data Science Introduction and Process in Data Science
DATA PROCESSING pharmaceutics III by 3rd prof students
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
Lect 1b - Introduction to Pharmacy Informatics 1b.pdf
Computer Applications in Business Aarthy ppt.pdf
Data Processing.pptx
BA.pptx
unit 1 big data.pptx
Data Science Introduction and Process in Data Science

Similar to Module 5.pptxData processing involves transforming raw data into useful information (20)

PDF
Understanding your Data - Data Analytics Lifecycle and Machine Learning
PPTX
Data processing and analysis
PPTX
Aggahsbsbsbsbsbsbsbsbsbwbshhwhwhwgwhwhwh
PPTX
ch2 DS.pptx
PPTX
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
PPTX
Data Processing and its Types
PPTX
Different types of data processing
PPTX
DATA PROCESSING. ( Information communication and technology) pptx
PDF
Data-Ed: Building the Case for the Top Data Job
PDF
DataEd Online: Building the Case for the Top Data Job
PPTX
Data analytics
PPTX
BAS 250 Lecture 1
PDF
Data Warehousing and Suitable for BCA, BSC, MCA
PPTX
Introduction to Big Data Analytics
PDF
COMPUTER OPERATIONS & PACKAGES NOTES & INTRODUCTION TO COMPUTERS
PPTX
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
PDF
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
PDF
Module 2 Data Collection and Management.pdf
PDF
PPTX
Introduction To Data Mining and Data Mining Techniques.pptx
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Data processing and analysis
Aggahsbsbsbsbsbsbsbsbsbwbshhwhwhwgwhwhwh
ch2 DS.pptx
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Data Processing and its Types
Different types of data processing
DATA PROCESSING. ( Information communication and technology) pptx
Data-Ed: Building the Case for the Top Data Job
DataEd Online: Building the Case for the Top Data Job
Data analytics
BAS 250 Lecture 1
Data Warehousing and Suitable for BCA, BSC, MCA
Introduction to Big Data Analytics
COMPUTER OPERATIONS & PACKAGES NOTES & INTRODUCTION TO COMPUTERS
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
Module 2 Data Collection and Management.pdf
Introduction To Data Mining and Data Mining Techniques.pptx
Ad

More from LakshmiKVN1 (14)

PPTX
V or K.pptx recovery curves. the different economy recovery curves
PPTX
accountingconceptsandconvention-170309131745.pptx
PPTX
What is Automation meaning, examples.pptx
PPTX
Computerized Accounting(tally) -PPT.pptx
PPT
Utilitarianism ethical theories. one of the consequentialist theory
PPTX
International marketing.pptx
PPTX
Module 3.pptx
PPTX
SM Upload.pptx
PPTX
Digital marketing
PPTX
HAWTHRONE EXPERIMENT –MASS INTERVIEWING.pptx
PPTX
MODULE 3.pptx
PPTX
MODULE 1 - RETAIL AND SUPPLY CHAIN MANAGEMENT (1).pptx
PPTX
Digital Marketing
PPTX
OP Chapter 1 PPT.pptx
V or K.pptx recovery curves. the different economy recovery curves
accountingconceptsandconvention-170309131745.pptx
What is Automation meaning, examples.pptx
Computerized Accounting(tally) -PPT.pptx
Utilitarianism ethical theories. one of the consequentialist theory
International marketing.pptx
Module 3.pptx
SM Upload.pptx
Digital marketing
HAWTHRONE EXPERIMENT –MASS INTERVIEWING.pptx
MODULE 3.pptx
MODULE 1 - RETAIL AND SUPPLY CHAIN MANAGEMENT (1).pptx
Digital Marketing
OP Chapter 1 PPT.pptx
Ad

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPT
Quality review (1)_presentation of this 21
PDF
annual-report-2024-2025 original latest.
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Introduction to the R Programming Language
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
Database Infoormation System (DBIS).pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
Fluorescence-microscope_Botany_detailed content
Quality review (1)_presentation of this 21
annual-report-2024-2025 original latest.
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Clinical guidelines as a resource for EBP(1).pdf
Supervised vs unsupervised machine learning algorithms
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction-to-Cloud-ComputingFinal.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
climate analysis of Dhaka ,Banglades.pptx
Introduction to the R Programming Language
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Data Science and Data Analysis
Database Infoormation System (DBIS).pptx
SAP 2 completion done . PRESENTATION.pptx

Module 5.pptxData processing involves transforming raw data into useful information

  • 1. Module 5 Dr. K V N LAKSHMI
  • 2. What is data processing • Data processing involves transforming raw data into useful information • Stages of data processing include collection, filtering, sorting, and analysis • Data processing relies on various tools and techniques to ensure accurate, valuable output
  • 4. Data collection • The first stage of data collection involves gathering and discovering raw data from various sources, such as sensors, databases, or customer surveys. It is essential to ensure the collected data is accurate, complete, and relevant to the analysis or processing goals. Care must be taken to avoid selection bias, where the method of collecting data inadvertently favors certain outcomes or groups, potentially skewing results and leading to inaccurate conclusions.
  • 5. Data preparation • Once the data is collected, it moves to the data preparation stage. Here, the raw data is cleaned up, organized, and often enriched for further processing. This stage involves checking for errors, removing any bad data (redundant, incomplete, or incorrect), and enhancing the dataset with additional relevant information from external sources, a process known as data enrichment. Data preparation aims to create high-quality, reliable, and comprehensive data for subsequent processing steps.
  • 6. Data input • The next stage is data input. In this stage, the clean and prepped data is fed into a processing system, which could be software or an algorithm designed for specific data types or analysis goals. Various methods, such as manual entry, data import from external sources, or automatic data capture, can be used to input data into the processing system.
  • 7. Data processing • In the data processing stage, the input data is transformed, analyzed, and organized to produce relevant information. Several data processing techniques, like filtering, sorting, aggregation, or classification, may be employed to process the data. The choice of methods depends on the desired outcome or insights from the data.
  • 8. Data output and interpretation • The data output and interpretation stage deals with presenting the processed data in an easily digestible format. This could involve generating reports, graphs, or visualizations that simplify complex data patterns and help with decision-making. Furthermore, the output data should be interpreted and analyzed to extract valuable insights and knowledge.X
  • 9. Data storage • Finally, in the data storage stage, the processed information is securely stored in databases or data warehouses for future retrieval, analysis, or use. Proper storage ensures data longevity, availability, and accessibility while maintaining data privacy and security.
  • 11. Batch processing • Batch processing involves handling large volumes of data collectively at predetermined times, making it ideal for non- time-sensitive tasks. This method allows organizations to efficiently manage data by aggregating it and processing it during off-peak hours to minimize the impact on daily operations. • Example: Financial institutions batch process checks and transactions overnight, updating account balances in one comprehensive sweep to ensure accuracy and efficiency.
  • 12. Real-time processing • Real-time processing is essential for tasks that require immediate handling of data upon receipt, providing instant processing and feedback. This type of processing is crucial for applications where delays cannot be tolerated, ensuring timely decisions and responses. • Example: GPS navigation systems rely on real-time processing to offer turn-by-turn directions, adjusting routes based on live traffic and road conditions to ensure the fastest path.
  • 13. Multiprocessing (parallel processing) • Multiprocessing, or parallel processing, involves utilizing multiple processing units or CPUs to handle various tasks simultaneously. This approach allows for more efficient data processing, particularly for complex computations that can be broken down into smaller, concurrent tasks, thereby speeding up overall processing time. • Example: Movie production often utilizes multiprocessing for rendering complex 3D animations. By distributing the rendering across multiple computers, the overall project's completion time is significantly reduced, leading to faster production cycles and improved visual quality.
  • 14. Online processing • Online processing facilitates the interactive processing of data over a network, with continuous input and output for instant responses. It enables systems to handle user requests immediately, making it an essential component of e-commerce and online services. • Example: Online banking systems utilize online processing for real-time financial transactions, allowing users to transfer funds, pay bills, and check account balances with immediate updates.
  • 15. Manual data processing • Manual data processing requires human intervention for the input, processing, and output of data, typically without the aid of electronic devices. This labor-intensive method is prone to errors but was common before the advent of computerized systems. • Example: Before the widespread use of computers, libraries cataloged books manually, requiring librarians to carefully record each book's details by hand for inventory and retrieval purposes.
  • 16. Mechanical data processing • Mechanical data processing uses machines or equipment to manage and process data tasks, a prevalent method before the digital era. This approach involved using tangible, mechanical devices to input, process, and output data. • Example: Voting in the early 20th century often involved mechanical lever machines, where votes were tallied by pulling levers for each choice, simplifying vote counting and reducing the potential for errors.
  • 17. Electronic data processing • Electronic data processing employs computers and digital technology to process, store, and communicate data with efficiency and accuracy. This modern approach to data handling allows for rapid processing speeds, vast storage capabilities, and easy data retrieval. • Example: Retailers use electronic data processing at checkouts, where barcode scans instantly update inventory systems and process sales, enhancing checkout speed and inventory management.
  • 18. Distributed processing • Distributed processing involves spreading computational tasks across multiple computers or devices to improve processing speed and reliability. This method leverages the collective power of various systems to handle large-scale processing tasks more efficiently than could be achieved with a single computer. • Example: Video streaming services use distributed processing to deliver content efficiently. By storing videos on multiple servers, they ensure smooth playback and quick access for users worldwide.
  • 19. Cloud computing • Cloud computing offers computing resources, such as servers, storage, and databases, over the internet, providing flexibility and scalability. This model enables users to access and utilize computing resources as needed, without the burden of maintaining physical infrastructure. • Example: Small businesses leverage cloud computing for data storage and software services, avoiding the need for significant upfront hardware investments and allowing easy scaling as the business grows.
  • 20. Automatic data processing • Automatic data processing uses software to automate routine tasks, reducing the need for manual input and increasing operational efficiency. This method streamlines repetitive processes, minimizes human error, and frees up personnel for more strategic tasks. • Example: Automated billing systems in telecommunications automatically calculate and send out monthly charges to customers, streamlining billing operations and reducing errors.
  • 21. Data preparation • Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. It is an important step prior to processing and often involves reformatting data, making corrections to data, and combining datasets to enrich data. • Data preparation is often a lengthy undertaking for data engineers or business users, but it is essential as a prerequisite to put data in context in order to turn it into insights and eliminate bias resulting from poor data quality.
  • 22. Benefits of data preparation in the cloud • Fix errors quickly — Data preparation helps catch errors before processing. After data has been removed from its original source, these errors become more difficult to understand and correct. • Produce top-quality data — Cleaning and reformatting datasets ensures that all data used in analysis will be of high quality. • Make better business decisions — Higher-quality data that can be processed and analyzed more quickly and efficiently leads to more timely, efficient, better-quality business decisions. • Additionally, as data and data processes move to the cloud, data preparation moves with it for even greater benefits, such as: • Superior scalability — Cloud data preparation can grow at the pace of the business. Enterprises don’t have to worry about the underlying infrastructure or try to anticipate their evolutions. • Future proof — Cloud data preparation upgrades automatically so that new capabilities or problem fixes can be turned on as soon as they are released. This allows organizations to stay ahead of the innovation curve without delays and added costs. • Accelerated data usage and collaboration — Doing data prep in the cloud means it is always on, doesn’t require any technical installation, and lets teams collaborate on the work for faster results.
  • 23. Data preparation steps questionnaire checking • The data preparation process begins with finding the right data. This can come from an existing data catalog or data sources can be added ad-hoc. • Check whether questionnaire is acceptable or not. • Complete or not Data editing • Data editing is the application of checks to detect missing, invalid or inconsistent entries or to point to data records that are potentially in error. No matter what type of data you are working with, certain edits are performed at different stages or phases of data collection and processing. • Detect errors and omissions
  • 24. Data preparation steps Data coding: Converting data into codes. Process of assigning numerical values to responses that are originally in a given format such as numerical, text, audio or video. The main objective is to facilitate the automatic treatment of data for analytical purposes. Coded data can be analyzed using statistical software tools.
  • 25. Data preparation steps Data classification: • Data classification is the practice of organizing and categorizing data elements according to pre-defined criteria. Classification makes data easier to locate and retrieve. Classifying data is instrumental in promoting risk management, security, and regulatory compliance. • Steps for Effective Data Classification • Understand the Current Setup: Taking a detailed look at the location of current data and all regulations that pertain to your organization is perhaps the best starting point for effectively classifying data. You must know what data you have before you can classify it. • Creating a Data Classification Policy: Staying compliant with data protection principles in an organization is nearly impossible without proper policy. Creating a policy should be your top priority. • Prioritize and Organize Data: Now that you have a policy and a picture of your current data, it’s time to properly classify the data. Decide on the best way to tag your data based on its sensitivity and privacy.
  • 26. Data preparation steps: Classification is of two types According to attribute Ex. Literacy rate Honesty Beauty Weight Height According to class-interval • Income • Production • Age • Sometimes weight, height
  • 27. Data preparation steps Tabulation: Tabulation is a method of presenting numeric data in rows and columns in a logical and systematic manner to aid comparison and statistical analysis. It allows for easier comparison by putting relevant data closer together, and it aids in statistical analysis and interpretation.
  • 28. IMPORTANCE OF TABULATION • Information or any statistics presented in a table should be alienated into different dimensions and for each dimension should be clearly mention the grand totals and sub totals to show the associations between different dimensions of data put in the tabular form easy understand. • (The preparations of any statistics should be arranged in a systematic manner with a heading and proper numberings which simply helps the readers to recognize the necessary responsibility to the research. • Tabulation builds the data into concise form; as a result, it helps the reader to understand easily. This data can also be presented in the form of graphs/charts/flow charts/ diagrams. • The data in tabular form can be shown in the numerical figures in an attention- grabbing form. • It makes difficult data into a simpler form and as a result it becomes easy to categorize within the data.
  • 29. IMPORTANCE OF TABULATION • Tabulation type of the arrangement is helpful in knowing the mistakes. • Tables will be helpful in condensing the information and makes easy to examine the contents. • Tabulation is economical mode to put the current data and helps to minimize the time and in turn researcher will able perform the work effectively. • Recently the formation of tabular information with the help of gadgets easily summaries the large data which is scattered in a systematic.
  • 31. Data preparation steps Graphical representation refers to the use of charts and graphs to visually display, analyze, clarify, and interpret numerical data, functions, and other qualitative structures.
  • 33. Stem and leaf plot • A stem and leaf plot is used to organize data as they are collected. A stem and leaf plot looks something like a bar graph. Each number in the data is broken down into a stem and a leaf, thus the name. • Ex: 15,27, 8,17, 13, 17,22, 24,25,14,13, 36,22,22,32, 32,28,7 • Ex. 72, 85,89, 93, 88, 109, 115, 97, 102, 113 • Ex. 1.2, 2.3, 1.5, 1.6, 1.8, 2.7, 3.2, 3.6, 4.5,7.8,7.1,10.6, 11.5
  • 35. Data preparation steps Data Cleaning: is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset
  • 36. Deduplication • Deduplication refers to a method of eliminating a dataset's redundant data. In a secure data deduplication process, a deduplication assessment tool identifies extra copies of data and deletes them, so a single instance can then be stored. Data deduplication software analyses data to identify duplicate byte patterns.
  • 37. What is ANOVA? • ANOVA, or Analysis of Variance, is a test used to determine differences between research results from three or more unrelated samples or groups • The key word in ‘Analysis of Variance’ is the last one. ‘Variance’ represents the degree to which numerical values of a particular variable deviate from its overall mean. You could think of the dispersion of those values plotted on a graph, with the average being at the centre of that graph. The variance provides a measure of how scattered the data points are from this central value. • H0:There is no difference between the group means
  • 38. The Chi squared tests • What Is Goodness-of-Fit: The term goodness-of-fit refers to a statistical test that determines how well sample data fits a distribution from a population with a normal distribution. Put simply, it hypothesizes whether a sample is skewed or represents the data you would expect to find in the actual population. H0:There is no difference between the group means
  • 39. T-test • A t-test is a statistical tool that compares the means of two groups or the mean of a group to a standard value. It's also known as a Student's t-test, t-statistic, or t-distribution • H0:There is no difference between the group means
  • 40. One-Sample Proportion Test • The One-Sample Proportion Test is used to assess whether a population proportion (P1) is significantly different from a hypothesized value (P0). This is called the hypothesis of inequality. • H0:There is no difference between the group means
  • 41. Correlational test • A correlational test, also known as correlation analysis, is a statistical method that measures the strength and direction of the relationship between two or more variables. The results of a correlational test are summarized as a correlation coefficient, which is a number between - 1 and +1. The value of the coefficient indicates the strength of the relationship, and the sign indicates the direction
  • 42. Hypothesis: • The effect of social media on mental well-being does not significantly vary based on the frequency of its usage. • ANOVA • Average Daily Social Media Usage: • Sum of Squares df Mean Square F Sig. • Between Groups 52.737 4 13.184 14.688 .000 • Within Groups 82.582 92 .898 • Total 135.320 96
  • 43. Hypothesis • H0: The level of social media addiction does not differ significantly between gender. Chi-Square Tests Value df Asymp. Sig. (2-sided) Exact Sig. (2- sided) Exact Sig. (1- sided) Pearson Chi-Square .029a 1 .865 Continuity Correctionb .000 1 1.000 Likelihood Ratio .029 1 .865 Fisher's Exact Test 1.000 .516 Linear-by-Linear Association .029 1 .866 N of Valid Cases 97 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 16.60. b. Computed only for a 2x2 table
  • 44. Hypothesis • There is no relation between the type of accommodation and spending more time on social media than intended.