SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 03 | Mar-2014, Available @ http://www.ijret.org 412
AN APPROACH FOR DISCRIMINATION PREVENTION IN DATA
MINING
Rupanjali Dive1
, Anagha Khedkar2
1
Matoshri College of Engineering & Research Center, Nashik University of Pune
2
Matoshri College of Engineering & Research Center, Nashik University of Pune
Abstract
In the age of Database technologies a large amount of data is collected and analyzed by using data mining techniques. However, the
main issue in data mining is potential privacy invasion and potential discrimination. One of the techniques used in data mining for
making decision is classification. On the other hand, if the dataset is biased then the discriminatory decision may occur. Therefore, in
this paper we review the recent state of the art approaches for antidiscrimination techniques and also focuses on discrimination
discovery and prevention in data mining. On the other hand, we also study a theoretical proposal for enhancing the results of the data
quality.
Keywords- Antidiscrimination, data mining, direct and indirect discrimination prevention, rule protection, rule
generalization, privacy.
-----------------------------------------------------------------------***-------------------------------------------------------------------
1. INTRODUCTION
In data mining, discrimination is one of the issues discussed in
the recent literature. Discrimination denies the members of one
group with others. A law is designed to prevent discrimination
in data mining. Discrimination can be done on attributes viz.
religion, nationality, marital status and age.
A large amount of data is collected by credit card companies,
bank and insurance agencies. Thus, these collected data are
auxiliary utilized by companies for decision making purpose in
data mining techniques. The association and or classification
rules can be used in making the decision for loan granting and
insurance computation.
Discrimination can be direct and indirect. Direct discrimination
consists of rules or procedures that explicitly mention minority
or disadvantaged groups based on sensitive discriminatory
attributes related to group membership. Indirect discrimination
consists of rules or procedures that, while not explicitly
mentioning discriminatory attributes, intentionally or
unintentionally could generate discriminatory decisions.
In this paper, we review the issue of direct and indirect
discrimination. The rest of the paper is organized as follows.
The section 2 discussed the existing literature review of the
various approaches. Section 3 discussed the analysis of the
existing approaches. Section 4 presented a theoretical proposal
of new approach. At the end, conclusion is presented in section
5.
2. RELATED WORK
In this section, we discussed the state of the art approaches
dealing with the antidiscrimination in data mining. However,
we observe in recent literature, the issue of antidiscrimination is
not attended by the several authors.
R.Agrawal and R.Srikant [1] discussed the association rule
method for the large database. Also they presented two
algorithms that discover association between items in a large
database of transactions. However, the performance gap is
increases with the problem size. On the other side, they did not
consider the quantities of the items bought in a transaction.
T.Calders and S.Verwer [2] presented a modified Naive Bayes
classification approach. In this, the author performs
classification of the data in such a way that focuses on
independent sensitive attribute. Such independency restrictions
occur naturally when the decision process leading to the labels
in the data-set was biased; e.g., due to gender or racial
discrimination. This setting is motivated by many cases in
which there exist laws that disallow a decision that is partly
based on discrimination. This approach does not consider
numerical attributes viz. Income as a sensitive attribute.
F.Kamiran and T.Calders [3] proposed an approach which
focuses on the concept of classification without discrimination.
In this, the author introduced the idea of Classification with No
Discrimination (CND). Thus, the author proposed a solution
based on “massaging” the data to remove the discrimination
from it with the least possible changes. On the other hand, the
author also proposed a new solution to the CND problem. In
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 03 | Mar-2014, Available @ http://www.ijret.org 413
this method, the author introduced a sampling scheme for
making the data discrimination free instead of relabeling the
dataset. The issues the author did not consider such as they do
not proposing discrimination model which is used in many
cases. Also, it is acceptable from an ethical and legal point of
view to have some discrimination.
D. Pedreschi, S. Ruggieri, and F. Turini [4] presented the issue
of discrimination in social sense viz. against minorities and
disadvantaged groups. The author attempt to handle a dataset of
decision records In this approach, the author uses a
classification rule for solving a problem. On the other hand, a
measure of quantitative discrimination is also introduced.
D. Pedreschi, S. Ruggieri, and F. Turini[5] presented a method
that is used find the evidence of discrimination in datasets of
historical decision records in socially sensitive tasks viz. Access
to credit, mortgage, insurance, and labor market. They also
focus on the rule based framework process for direct and
indirect discrimination. In this, they also focus on the
quantitative measures.
S. Hajian, J. Domingo-Ferrer, and A. Martinez-Balleste[6]
introduced an anti-discrimination in the context of cyber
security. And proposed data transformation method for
discrimination prevention and considered several
discriminatory attributes and their combinations. The issue of
data quality is also addressed. But, the limitation of this method
is that first, they does not run method on real datasets and also
do not consider background knowledge (indirect
discrimination).
Faisal Kamiran, Toon Calders and Mykola Pechenizkiy [7]
presented a model for decision making in data mining. the
author proposed a new techniques viz. discrimination aware.
The main objective is to learn classification model by using
potentially biased historical data. The care has been taken in
such a way that it will generate accurate predictions for future
decision making. However, the author introduced two
techniques viz. Dependency-Aware Tree Construction and Leaf
Relabeling for incorporating discrimination awareness into the
decision tree construction process.
Faisal Kamiran, Toon Calders [8] introduced a classification
model which works impartially for future data. The limitation
of this approach is that they do not deem other classification
models for discrimination-free classification. Also, do not
incorporate numerical attributes and groups of attributes as
sensitive attribute(s).
Sara Hajian and Josep Domingo-Ferrer, Fellow, IEEE[9]
proposed preprocessing methods which overcome the above
limitations and issues. The author introduced a new data
transformation method which uses rule protection and rule
generalization. This method handles both the issue such as
direct and indirect discrimination and also can deal with several
discriminatory items.
Thus, based on the issue and limitation investigated in the
literature, new data transformation methods for discrimination
prevention need to be designed.
3. OUR ANALYSIS:
During the investigation in the recent state-of-the art literature,
we identified some of the issues. First, the literature focus on
the attempt to detect discrimination in the original data only for
one discriminatory item and also based on a single measure.
Second, it cannot guarantee that the transformed data set is
really discrimination free.
Third, the literature focuses on the direct discrimination.
Fourth, the state of the art approaches do not shows any
measure to evaluate how much discrimination has been
removed. Thus, the approaches did not concentrate on the
amount of information loss generated.
4. DIRECT AND INDIRECT DISCRIMINATION :
The issues has been investigated in the recent literature and
discussed in the section 3. Based on investigation, presented a
new preprocessing discrimination prevention methodology.
Thus, the central theme of our approach is to use data
transformation methods that help to prevent direct
discrimination, indirect discrimination or both of them at the
same time.
To meet this objective the following steps need to be carried
out.
 First step is to measure discrimination and identify
categories. Based on the same theme, make groups of
individuals that have been directly and/or indirectly
discriminated in the decision-making processes.
 Second step is to transform data in the proper way to
remove all those discriminatory biases.
 Third, discrimination-free data models can be generated
by using the transformed data. However, the data
transformation is been conducted in such a way that data
quality should be hurtful.
5. CONCLUSIONS
In this paper, we discussed the issues and limitation of the
recent state of the approaches. Based on the same issues, we
study an approach that uses transformation method. This
approach helps to prevent direct discrimination and indirect
discrimination. However, the care has been taken for
maintaining the data quality and privacy during the
transformation. Thus, our future work is to implement a
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 03 | Mar-2014, Available @ http://www.ijret.org 414
transformation method such that the data quality will not be
disturbed.
REFERENCES
[1]. R. Agrawal and R. Srikant, “Fast Algorithms for Mining
Association Rules in Large Databases,” Proc. 20th Int’l Conf.
Very Large Data Bases, pp. 487-499, 1994.
[2]. T. Calders and S. Verwer, “Three Naive Bayes Approaches
for Discrimination-Free Classification,” Data Mining and
Knowledge Discovery, vol. 21, no. 2, pp. 277-292, 2010.
[3]. F. Kamiran and T. Calders, “Classification with no
Discrimination by Preferential Sampling,” Proc. 19th Machine
Learning Conf.Belgium and The Netherlands, 2010.
[4]. European Commission, “EU Directive 2006/54/EC on
Anti-Discrimination,” http://eur-
lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2006:204:0
023:0036:en:PDF, 2006.
[5]. D. Pedreschi, S. Ruggieri, and F. Turini, “Integrating
Induction and Deduction for Finding Evidence of
Discrimination,” Proc. 12th ACM Int’l Conf. Artificial
Intelligence and Law (ICAIL ’09), pp. 157-166, 2009.
[6]. S. Hajian, J. Domingo-Ferrer, and A. Martı´nez-Balleste´,
“Rule Protection for Indirect Discrimination Prevention in Data
Mining,” Proc. Eighth Int’l Conf. Modeling Decisions for
Artificial Intelligence (MDAI ’11), pp. 211-222, 2011.
[7]. F. Kamiran, T. Calders, and M. Pechenizkiy,
“Discrimination Aware Decision Tree Learning,” Proc. IEEE
Int’l Conf. Data Mining (ICDM ’10), pp. 869-874, 2010.
[8]. F. Kamiran, T. Calders, and M. Pechenizkiy,
“Discrimination Aware Decision Tree Learning,” Proc. IEEE
Int’l Conf. Data Mining (ICDM ’10),pp. 869-874, 2010.

More Related Content

PDF
C0364012016
PDF
J046065457
PDF
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
PDF
Ijcet 06 07_004
PDF
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...
PDF
A methodology for direct and indirect discrimination prevention in data mining
PDF
Measuring Improvement in Access to Complete Data in Healthcare Collaborative ...
PDF
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
C0364012016
J046065457
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
Ijcet 06 07_004
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...
A methodology for direct and indirect discrimination prevention in data mining
Measuring Improvement in Access to Complete Data in Healthcare Collaborative ...
A SURVEY OF LINK MINING AND ANOMALIES DETECTION

What's hot (19)

PDF
11.software modules clustering an effective approach for reusability
PDF
27 11 sep17 29aug 8513 9956-1-ed (edit)
PDF
A survey of memory based methods for collaborative filtering based techniques
PDF
Gene Selection Based on Rough Set Applications of Rough Set on Computational ...
PDF
Classification By Clustering Based On Adjusted Cluster
PDF
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
PDF
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
PDF
Framework for opinion as a service on review data of customer using semantics...
PDF
IRJET- A Survey on Link Prediction Techniques
PDF
Evaluating the efficiency of rule techniques for file
PDF
Clustering Prediction Techniques in Defining and Predicting Customers Defecti...
PDF
PDF
Meta Classification Technique for Improving Credit Card Fraud Detection
PDF
Performance Analysis of Selected Classifiers in User Profiling
PDF
A survey on discrimination deterrence in data mining
PDF
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
PDF
A comprehensive survey of link mining and anomalies detection
PDF
System Adoption: Socio-Technical Integration
PDF
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...
11.software modules clustering an effective approach for reusability
27 11 sep17 29aug 8513 9956-1-ed (edit)
A survey of memory based methods for collaborative filtering based techniques
Gene Selection Based on Rough Set Applications of Rough Set on Computational ...
Classification By Clustering Based On Adjusted Cluster
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Framework for opinion as a service on review data of customer using semantics...
IRJET- A Survey on Link Prediction Techniques
Evaluating the efficiency of rule techniques for file
Clustering Prediction Techniques in Defining and Predicting Customers Defecti...
Meta Classification Technique for Improving Credit Card Fraud Detection
Performance Analysis of Selected Classifiers in User Profiling
A survey on discrimination deterrence in data mining
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A comprehensive survey of link mining and anomalies detection
System Adoption: Socio-Technical Integration
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...
Ad

Viewers also liked (20)

PDF
Intelligent computing techniques on medical image segmentation and analysis a...
PDF
Development of pavement management strategies for
PDF
Design and performance analysis of band pass filter
PDF
Two level data security using steganography and 2 d cellular automata
PDF
Investigation of behaviour of 3 degrees of freedom
PDF
Securing voip communications in an open network
PDF
It business management parameters framework from indian context
PDF
Performance analysis of fully depleted dual material
PDF
Human action recognition using local space time features and adaboost svm
PDF
Insect inspired hexapod robot for terrain navigation
PDF
Remedy for disease affected iris in iris recognition
PDF
Design and development of load sharing multipath routing protcol for mobile a...
PDF
Exposure to elevated temperatures and cooled under different regimes – a stud...
PDF
Wireless data transmission through uart port using arm & rf transceiver
PDF
Rule based messege filtering and blacklist management for online social network
PDF
A comparative study on classification of image segmentation methods with a fo...
PDF
Hazard object reporting to respective authorities
PDF
Stability analysis of open pit slope by finite difference method
PDF
Hyperspectral image mixed noise reduction based on improved k svd algorithm
PDF
An overview of stress analysis of high energy pipeline systems used in therma...
Intelligent computing techniques on medical image segmentation and analysis a...
Development of pavement management strategies for
Design and performance analysis of band pass filter
Two level data security using steganography and 2 d cellular automata
Investigation of behaviour of 3 degrees of freedom
Securing voip communications in an open network
It business management parameters framework from indian context
Performance analysis of fully depleted dual material
Human action recognition using local space time features and adaboost svm
Insect inspired hexapod robot for terrain navigation
Remedy for disease affected iris in iris recognition
Design and development of load sharing multipath routing protcol for mobile a...
Exposure to elevated temperatures and cooled under different regimes – a stud...
Wireless data transmission through uart port using arm & rf transceiver
Rule based messege filtering and blacklist management for online social network
A comparative study on classification of image segmentation methods with a fo...
Hazard object reporting to respective authorities
Stability analysis of open pit slope by finite difference method
Hyperspectral image mixed noise reduction based on improved k svd algorithm
An overview of stress analysis of high energy pipeline systems used in therma...
Ad

Similar to An approach for discrimination prevention in data mining (20)

PDF
Classification with No Direct Discrimination
PDF
Data Mining System and Applications: A Review
PDF
Privacy preservation techniques in data mining
PDF
Privacy preservation techniques in data mining
DOCX
Running Head Data Mining in The Cloud .docx
PDF
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PDF
Ez36937941
PDF
Study of Privacy Discrimination and Prevention in Data Mining
PDF
Hu3414421448
PDF
A Survey on the Classification Techniques In Educational Data Mining
PDF
Hy3414631468
PDF
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdf
PDF
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAIN
PDF
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAIN
PDF
The Survey of Data Mining Applications And Feature Scope
PDF
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
PDF
Protection of Direct and Indirect Discrimination using Prevention Methods
PDF
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
PDF
Data discrimination prevention in customer relationship managment
PDF
Data Mining: Investment risk in the bank
Classification with No Direct Discrimination
Data Mining System and Applications: A Review
Privacy preservation techniques in data mining
Privacy preservation techniques in data mining
Running Head Data Mining in The Cloud .docx
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
Ez36937941
Study of Privacy Discrimination and Prevention in Data Mining
Hu3414421448
A Survey on the Classification Techniques In Educational Data Mining
Hy3414631468
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdf
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAIN
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAIN
The Survey of Data Mining Applications And Feature Scope
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Protection of Direct and Indirect Discrimination using Prevention Methods
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Data discrimination prevention in customer relationship managment
Data Mining: Investment risk in the bank

More from eSAT Publishing House (20)

PDF
Likely impacts of hudhud on the environment of visakhapatnam
PDF
Impact of flood disaster in a drought prone area – case study of alampur vill...
PDF
Hudhud cyclone – a severe disaster in visakhapatnam
PDF
Groundwater investigation using geophysical methods a case study of pydibhim...
PDF
Flood related disasters concerned to urban flooding in bangalore, india
PDF
Enhancing post disaster recovery by optimal infrastructure capacity building
PDF
Effect of lintel and lintel band on the global performance of reinforced conc...
PDF
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
PDF
Wind damage to buildings, infrastrucuture and landscape elements along the be...
PDF
Shear strength of rc deep beam panels – a review
PDF
Role of voluntary teams of professional engineers in dissater management – ex...
PDF
Risk analysis and environmental hazard management
PDF
Review study on performance of seismically tested repaired shear walls
PDF
Monitoring and assessment of air quality with reference to dust particles (pm...
PDF
Low cost wireless sensor networks and smartphone applications for disaster ma...
PDF
Coastal zones – seismic vulnerability an analysis from east coast of india
PDF
Can fracture mechanics predict damage due disaster of structures
PDF
Assessment of seismic susceptibility of rc buildings
PDF
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
PDF
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Likely impacts of hudhud on the environment of visakhapatnam
Impact of flood disaster in a drought prone area – case study of alampur vill...
Hudhud cyclone – a severe disaster in visakhapatnam
Groundwater investigation using geophysical methods a case study of pydibhim...
Flood related disasters concerned to urban flooding in bangalore, india
Enhancing post disaster recovery by optimal infrastructure capacity building
Effect of lintel and lintel band on the global performance of reinforced conc...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Shear strength of rc deep beam panels – a review
Role of voluntary teams of professional engineers in dissater management – ex...
Risk analysis and environmental hazard management
Review study on performance of seismically tested repaired shear walls
Monitoring and assessment of air quality with reference to dust particles (pm...
Low cost wireless sensor networks and smartphone applications for disaster ma...
Coastal zones – seismic vulnerability an analysis from east coast of india
Can fracture mechanics predict damage due disaster of structures
Assessment of seismic susceptibility of rc buildings
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...

Recently uploaded (20)

PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
DOCX
573137875-Attendance-Management-System-original
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPT
Project quality management in manufacturing
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
PPT on Performance Review to get promotions
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
OOP with Java - Java Introduction (Basics)
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
573137875-Attendance-Management-System-original
UNIT 4 Total Quality Management .pptx
Internet of Things (IOT) - A guide to understanding
R24 SURVEYING LAB MANUAL for civil enggi
Project quality management in manufacturing
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT on Performance Review to get promotions
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx

An approach for discrimination prevention in data mining

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Issue: 03 | Mar-2014, Available @ http://www.ijret.org 412 AN APPROACH FOR DISCRIMINATION PREVENTION IN DATA MINING Rupanjali Dive1 , Anagha Khedkar2 1 Matoshri College of Engineering & Research Center, Nashik University of Pune 2 Matoshri College of Engineering & Research Center, Nashik University of Pune Abstract In the age of Database technologies a large amount of data is collected and analyzed by using data mining techniques. However, the main issue in data mining is potential privacy invasion and potential discrimination. One of the techniques used in data mining for making decision is classification. On the other hand, if the dataset is biased then the discriminatory decision may occur. Therefore, in this paper we review the recent state of the art approaches for antidiscrimination techniques and also focuses on discrimination discovery and prevention in data mining. On the other hand, we also study a theoretical proposal for enhancing the results of the data quality. Keywords- Antidiscrimination, data mining, direct and indirect discrimination prevention, rule protection, rule generalization, privacy. -----------------------------------------------------------------------***------------------------------------------------------------------- 1. INTRODUCTION In data mining, discrimination is one of the issues discussed in the recent literature. Discrimination denies the members of one group with others. A law is designed to prevent discrimination in data mining. Discrimination can be done on attributes viz. religion, nationality, marital status and age. A large amount of data is collected by credit card companies, bank and insurance agencies. Thus, these collected data are auxiliary utilized by companies for decision making purpose in data mining techniques. The association and or classification rules can be used in making the decision for loan granting and insurance computation. Discrimination can be direct and indirect. Direct discrimination consists of rules or procedures that explicitly mention minority or disadvantaged groups based on sensitive discriminatory attributes related to group membership. Indirect discrimination consists of rules or procedures that, while not explicitly mentioning discriminatory attributes, intentionally or unintentionally could generate discriminatory decisions. In this paper, we review the issue of direct and indirect discrimination. The rest of the paper is organized as follows. The section 2 discussed the existing literature review of the various approaches. Section 3 discussed the analysis of the existing approaches. Section 4 presented a theoretical proposal of new approach. At the end, conclusion is presented in section 5. 2. RELATED WORK In this section, we discussed the state of the art approaches dealing with the antidiscrimination in data mining. However, we observe in recent literature, the issue of antidiscrimination is not attended by the several authors. R.Agrawal and R.Srikant [1] discussed the association rule method for the large database. Also they presented two algorithms that discover association between items in a large database of transactions. However, the performance gap is increases with the problem size. On the other side, they did not consider the quantities of the items bought in a transaction. T.Calders and S.Verwer [2] presented a modified Naive Bayes classification approach. In this, the author performs classification of the data in such a way that focuses on independent sensitive attribute. Such independency restrictions occur naturally when the decision process leading to the labels in the data-set was biased; e.g., due to gender or racial discrimination. This setting is motivated by many cases in which there exist laws that disallow a decision that is partly based on discrimination. This approach does not consider numerical attributes viz. Income as a sensitive attribute. F.Kamiran and T.Calders [3] proposed an approach which focuses on the concept of classification without discrimination. In this, the author introduced the idea of Classification with No Discrimination (CND). Thus, the author proposed a solution based on “massaging” the data to remove the discrimination from it with the least possible changes. On the other hand, the author also proposed a new solution to the CND problem. In
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Issue: 03 | Mar-2014, Available @ http://www.ijret.org 413 this method, the author introduced a sampling scheme for making the data discrimination free instead of relabeling the dataset. The issues the author did not consider such as they do not proposing discrimination model which is used in many cases. Also, it is acceptable from an ethical and legal point of view to have some discrimination. D. Pedreschi, S. Ruggieri, and F. Turini [4] presented the issue of discrimination in social sense viz. against minorities and disadvantaged groups. The author attempt to handle a dataset of decision records In this approach, the author uses a classification rule for solving a problem. On the other hand, a measure of quantitative discrimination is also introduced. D. Pedreschi, S. Ruggieri, and F. Turini[5] presented a method that is used find the evidence of discrimination in datasets of historical decision records in socially sensitive tasks viz. Access to credit, mortgage, insurance, and labor market. They also focus on the rule based framework process for direct and indirect discrimination. In this, they also focus on the quantitative measures. S. Hajian, J. Domingo-Ferrer, and A. Martinez-Balleste[6] introduced an anti-discrimination in the context of cyber security. And proposed data transformation method for discrimination prevention and considered several discriminatory attributes and their combinations. The issue of data quality is also addressed. But, the limitation of this method is that first, they does not run method on real datasets and also do not consider background knowledge (indirect discrimination). Faisal Kamiran, Toon Calders and Mykola Pechenizkiy [7] presented a model for decision making in data mining. the author proposed a new techniques viz. discrimination aware. The main objective is to learn classification model by using potentially biased historical data. The care has been taken in such a way that it will generate accurate predictions for future decision making. However, the author introduced two techniques viz. Dependency-Aware Tree Construction and Leaf Relabeling for incorporating discrimination awareness into the decision tree construction process. Faisal Kamiran, Toon Calders [8] introduced a classification model which works impartially for future data. The limitation of this approach is that they do not deem other classification models for discrimination-free classification. Also, do not incorporate numerical attributes and groups of attributes as sensitive attribute(s). Sara Hajian and Josep Domingo-Ferrer, Fellow, IEEE[9] proposed preprocessing methods which overcome the above limitations and issues. The author introduced a new data transformation method which uses rule protection and rule generalization. This method handles both the issue such as direct and indirect discrimination and also can deal with several discriminatory items. Thus, based on the issue and limitation investigated in the literature, new data transformation methods for discrimination prevention need to be designed. 3. OUR ANALYSIS: During the investigation in the recent state-of-the art literature, we identified some of the issues. First, the literature focus on the attempt to detect discrimination in the original data only for one discriminatory item and also based on a single measure. Second, it cannot guarantee that the transformed data set is really discrimination free. Third, the literature focuses on the direct discrimination. Fourth, the state of the art approaches do not shows any measure to evaluate how much discrimination has been removed. Thus, the approaches did not concentrate on the amount of information loss generated. 4. DIRECT AND INDIRECT DISCRIMINATION : The issues has been investigated in the recent literature and discussed in the section 3. Based on investigation, presented a new preprocessing discrimination prevention methodology. Thus, the central theme of our approach is to use data transformation methods that help to prevent direct discrimination, indirect discrimination or both of them at the same time. To meet this objective the following steps need to be carried out.  First step is to measure discrimination and identify categories. Based on the same theme, make groups of individuals that have been directly and/or indirectly discriminated in the decision-making processes.  Second step is to transform data in the proper way to remove all those discriminatory biases.  Third, discrimination-free data models can be generated by using the transformed data. However, the data transformation is been conducted in such a way that data quality should be hurtful. 5. CONCLUSIONS In this paper, we discussed the issues and limitation of the recent state of the approaches. Based on the same issues, we study an approach that uses transformation method. This approach helps to prevent direct discrimination and indirect discrimination. However, the care has been taken for maintaining the data quality and privacy during the transformation. Thus, our future work is to implement a
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Issue: 03 | Mar-2014, Available @ http://www.ijret.org 414 transformation method such that the data quality will not be disturbed. REFERENCES [1]. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. 20th Int’l Conf. Very Large Data Bases, pp. 487-499, 1994. [2]. T. Calders and S. Verwer, “Three Naive Bayes Approaches for Discrimination-Free Classification,” Data Mining and Knowledge Discovery, vol. 21, no. 2, pp. 277-292, 2010. [3]. F. Kamiran and T. Calders, “Classification with no Discrimination by Preferential Sampling,” Proc. 19th Machine Learning Conf.Belgium and The Netherlands, 2010. [4]. European Commission, “EU Directive 2006/54/EC on Anti-Discrimination,” http://eur- lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2006:204:0 023:0036:en:PDF, 2006. [5]. D. Pedreschi, S. Ruggieri, and F. Turini, “Integrating Induction and Deduction for Finding Evidence of Discrimination,” Proc. 12th ACM Int’l Conf. Artificial Intelligence and Law (ICAIL ’09), pp. 157-166, 2009. [6]. S. Hajian, J. Domingo-Ferrer, and A. Martı´nez-Balleste´, “Rule Protection for Indirect Discrimination Prevention in Data Mining,” Proc. Eighth Int’l Conf. Modeling Decisions for Artificial Intelligence (MDAI ’11), pp. 211-222, 2011. [7]. F. Kamiran, T. Calders, and M. Pechenizkiy, “Discrimination Aware Decision Tree Learning,” Proc. IEEE Int’l Conf. Data Mining (ICDM ’10), pp. 869-874, 2010. [8]. F. Kamiran, T. Calders, and M. Pechenizkiy, “Discrimination Aware Decision Tree Learning,” Proc. IEEE Int’l Conf. Data Mining (ICDM ’10),pp. 869-874, 2010.