SlideShare a Scribd company logo
ML using minimal to no coding-KNIME.pptx
Step 1
• Read Data from File
– File Reader Node
– Table Reader Node
– Excel Reader Node
– Absolute and Relative Paths: the knime:// Protocol
• Accessing REST Services
Step 2
• ETL and Data Manipulation
– 2.1 Row and Column Filtering
– 2.2 Aggregations
– 2.3 Join and Concatenation
– 2.4 Transformation: Conversion, Replacement,
Standardization, and New Feature Generation
– 2.5 Data Preparation for Time Series Analysis
2.1 Row and Column Filtering
• Basic Row Filter
• Advanced Row Filter
• Column Filter
2.3 Aggregations
• Classic Aggregations with GroupBy node: A
classic aggregation operation consists of two
steps:
– identifying data groups and
– calculating the aggregation method on the selected
groups.
• Basic groupby aggregation
• Advanced groupby aggregation
• Pivoting
ML using minimal to no coding-KNIME.pptx
Read adult.csv data set. Then:
1. calculate total number of rows and average age for all
Female with income >50K per year
2. on each one of the 4 groups defined by sex and
income values, calculate the average of all numerical
columns
3. on full input table count:
1. rows with missing values in column “occupation”
2. all rows in column “occupation”
3. rows with no missing value in column “occupation”
4. all rows in another column (i.e. marital-status). Notice that
this number should be the same as the number for all
rows in column “occupation”.
Pivoting
• The pivoting function requires one or more
grouping columns to define the rows, and one or
more pivoting columns to define the columns of
the pivot table.
• The rows and columns define unique sub-groups
of the data. These sub-groups can then be
summarized by aggregated measures.
• The possible aggregations range from listing and
counting values, to calculations on date & time,
and to statistical measures.
ML using minimal to no coding-KNIME.pptx
ML using minimal to no coding-KNIME.pptx
Question 1. Using the “age” column as the
grouping column and “workclass” column as
the pivoting column, calculate the number of
people in groups according to their work class
and age.
• 1a. What is the most common combination
of age bin and work class?
• 1b. How many people belong to this group?
• 2.3 Join and Concatenation
• Join:
– inner join,
– right outer join,
– left outer join,
– full outer join
• Concatenation
• Read adult.csv data set. Then calculate the
average age and number of rows for the 4
groups defined by (sex, income) and join the
corresponding 2 aggregated values to each
row in the group.
• Differentiate joining from concatenation
• Read adult.csv data set. Then extract people
with age between 20 and 40 and working in a
work group starting with "S" and people with
age between 40 and 60 and working in the
Private sector (workclass starts with "P"). Put
both groups in a single data table.
2.4 Transformation: Conversion, Replacement,
Standardization, and New Feature Generation
• Data are standardized before being stored, analyzed, or
reported. This means, string and date & time values are
converted to follow the same style and format, numbers are
normalized, and new features are created from the existing ones.
• Possible string manipulation operations are extracting substrings,
standardizing texts to lower case or upper case, or adding a
prefix/suffix to string values, for example.
• To numbers you could apply some kind of mathematical
transformation, like for example normalization or logarithmic
transformation.
• In general, data can be transformed to generate new, hopefully,
more informative input features.
Data Manipulation: Numbers, Strings, and
Rules
• String Manipulation node
• Math Formula node and
• Rule Engine node
• Read the sales.csv dataset.
• Using the Rule Engine node, create a new column
“currency” with value “USD” for the orders from the USA,
and “EUR” for the orders from Germany.
• Using the Rule Engine node, create a new column
“conversion” with value 1 if currency is “EUR”, and 0.88 if
currency is “USD” (we refer to the exchange rate of Nov-
04-2018).
• Using the Math Formula node, calculate values in a new
column named “amount-in-EUR” by multiplying the value
in column “amount” by the value in column “conversion”.
Column Expressions for Data Manipulation
• The Column Expressions node is useful
because it can perform multiple data
manipulation tasks at once.
• It can replace combinations of other data
manipulation nodes, such as the String
Manipulation, Math Formula, and Rule Engine
nodes, with this single node.
Exercise
• Read the sales.csv dataset.
• Write an expression that extracts the first three letters of
country names and converts them to upper case letters.
Append a new column and name it “Country_Code”.
• Write an expression that multiplies the sales amount by
the conversion rate. Replace the “amount” column, but
change its type to double.
• Write an expression that assigns the value “N” to the
missing values in the “card” column. Replace the “card”
column.
ML using minimal to no coding-KNIME.pptx

More Related Content

PPTX
CST 466 exam help data mining mod2.pptx
PDF
Day 4 - Excel Automation and Data Manipulation
DOC
Data Mining: Data Preprocessing
PPTX
mod3part 3 of robotic process automation
PPTX
19CS3052R-CO1-7-S7 ECE
PDF
Data Preprocessing in Data Mining Lecture Slide
PDF
EDA_Assignment_Sourabh S Hubballi.pdf
PDF
CST 466 exam help data mining mod2.pptx
Day 4 - Excel Automation and Data Manipulation
Data Mining: Data Preprocessing
mod3part 3 of robotic process automation
19CS3052R-CO1-7-S7 ECE
Data Preprocessing in Data Mining Lecture Slide
EDA_Assignment_Sourabh S Hubballi.pdf

Similar to ML using minimal to no coding-KNIME.pptx (20)

PPTX
RPA- Data Manipulation UNIT 2 - PART2.pptx
PDF
Week_2_Lecture.pdf
PPTX
Types of Data in Machine Learning, Number aand Categorical
PPT
1.1 introduction to Data Structures.ppt
PPTX
Etl - Extract Transform Load
PPT
5954987.ppt
PDF
06 Excel.pdf
PPTX
Introduction - Using Stata
PPTX
4 Statistical Software.pptx
PPTX
20150814 Wrangling Data From Raw to Tidy vs
PPT
Introduction to Spreadsheets.ppt
PPT
Basic Introduction of SPSS software_presentation
PPT
ds 1 Introduction to Data Structures.ppt
PPT
trs-3.ppt
PPTX
Training in basic drug abuse data management and analysis
PPT
trs-3.ppt
PPT
trs-3.ppt
PPT
training about new methodologies.Google's service, offered free of charge, in...
PPT
Analyzing_the_Nutritional_Awareness_Dietary
PPT
trs-3.ppt
RPA- Data Manipulation UNIT 2 - PART2.pptx
Week_2_Lecture.pdf
Types of Data in Machine Learning, Number aand Categorical
1.1 introduction to Data Structures.ppt
Etl - Extract Transform Load
5954987.ppt
06 Excel.pdf
Introduction - Using Stata
4 Statistical Software.pptx
20150814 Wrangling Data From Raw to Tidy vs
Introduction to Spreadsheets.ppt
Basic Introduction of SPSS software_presentation
ds 1 Introduction to Data Structures.ppt
trs-3.ppt
Training in basic drug abuse data management and analysis
trs-3.ppt
trs-3.ppt
training about new methodologies.Google's service, offered free of charge, in...
Analyzing_the_Nutritional_Awareness_Dietary
trs-3.ppt
Ad

Recently uploaded (20)

PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Artificial Intelligence
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPT
introduction to datamining and warehousing
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
737-MAX_SRG.pdf student reference guides
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
communication and presentation skills 01
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
UNIT - 3 Total quality Management .pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Artificial Intelligence
Fundamentals of Mechanical Engineering.pptx
Visual Aids for Exploratory Data Analysis.pdf
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
introduction to datamining and warehousing
Abrasive, erosive and cavitation wear.pdf
737-MAX_SRG.pdf student reference guides
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
communication and presentation skills 01
Fundamentals of safety and accident prevention -final (1).pptx
UNIT - 3 Total quality Management .pptx
III.4.1.2_The_Space_Environment.p pdffdf
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Nature of X-rays, X- Ray Equipment, Fluoroscopy
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Ad

ML using minimal to no coding-KNIME.pptx

  • 2. Step 1 • Read Data from File – File Reader Node – Table Reader Node – Excel Reader Node – Absolute and Relative Paths: the knime:// Protocol • Accessing REST Services
  • 3. Step 2 • ETL and Data Manipulation – 2.1 Row and Column Filtering – 2.2 Aggregations – 2.3 Join and Concatenation – 2.4 Transformation: Conversion, Replacement, Standardization, and New Feature Generation – 2.5 Data Preparation for Time Series Analysis
  • 4. 2.1 Row and Column Filtering • Basic Row Filter • Advanced Row Filter • Column Filter
  • 5. 2.3 Aggregations • Classic Aggregations with GroupBy node: A classic aggregation operation consists of two steps: – identifying data groups and – calculating the aggregation method on the selected groups. • Basic groupby aggregation • Advanced groupby aggregation • Pivoting
  • 7. Read adult.csv data set. Then: 1. calculate total number of rows and average age for all Female with income >50K per year 2. on each one of the 4 groups defined by sex and income values, calculate the average of all numerical columns 3. on full input table count: 1. rows with missing values in column “occupation” 2. all rows in column “occupation” 3. rows with no missing value in column “occupation” 4. all rows in another column (i.e. marital-status). Notice that this number should be the same as the number for all rows in column “occupation”.
  • 8. Pivoting • The pivoting function requires one or more grouping columns to define the rows, and one or more pivoting columns to define the columns of the pivot table. • The rows and columns define unique sub-groups of the data. These sub-groups can then be summarized by aggregated measures. • The possible aggregations range from listing and counting values, to calculations on date & time, and to statistical measures.
  • 11. Question 1. Using the “age” column as the grouping column and “workclass” column as the pivoting column, calculate the number of people in groups according to their work class and age. • 1a. What is the most common combination of age bin and work class? • 1b. How many people belong to this group?
  • 12. • 2.3 Join and Concatenation • Join: – inner join, – right outer join, – left outer join, – full outer join • Concatenation
  • 13. • Read adult.csv data set. Then calculate the average age and number of rows for the 4 groups defined by (sex, income) and join the corresponding 2 aggregated values to each row in the group.
  • 14. • Differentiate joining from concatenation • Read adult.csv data set. Then extract people with age between 20 and 40 and working in a work group starting with "S" and people with age between 40 and 60 and working in the Private sector (workclass starts with "P"). Put both groups in a single data table.
  • 15. 2.4 Transformation: Conversion, Replacement, Standardization, and New Feature Generation • Data are standardized before being stored, analyzed, or reported. This means, string and date & time values are converted to follow the same style and format, numbers are normalized, and new features are created from the existing ones. • Possible string manipulation operations are extracting substrings, standardizing texts to lower case or upper case, or adding a prefix/suffix to string values, for example. • To numbers you could apply some kind of mathematical transformation, like for example normalization or logarithmic transformation. • In general, data can be transformed to generate new, hopefully, more informative input features.
  • 16. Data Manipulation: Numbers, Strings, and Rules • String Manipulation node • Math Formula node and • Rule Engine node
  • 17. • Read the sales.csv dataset. • Using the Rule Engine node, create a new column “currency” with value “USD” for the orders from the USA, and “EUR” for the orders from Germany. • Using the Rule Engine node, create a new column “conversion” with value 1 if currency is “EUR”, and 0.88 if currency is “USD” (we refer to the exchange rate of Nov- 04-2018). • Using the Math Formula node, calculate values in a new column named “amount-in-EUR” by multiplying the value in column “amount” by the value in column “conversion”.
  • 18. Column Expressions for Data Manipulation • The Column Expressions node is useful because it can perform multiple data manipulation tasks at once. • It can replace combinations of other data manipulation nodes, such as the String Manipulation, Math Formula, and Rule Engine nodes, with this single node.
  • 19. Exercise • Read the sales.csv dataset. • Write an expression that extracts the first three letters of country names and converts them to upper case letters. Append a new column and name it “Country_Code”. • Write an expression that multiplies the sales amount by the conversion rate. Replace the “amount” column, but change its type to double. • Write an expression that assigns the value “N” to the missing values in the “card” column. Replace the “card” column.