SlideShare a Scribd company logo
GLOBALSOFT TECHNOLOGIES 
A Probabilistic Approach to String Transformation 
Abstract: 
Many problems in natural language processing, data mining, information retrieval, and 
bioinformatics can be formalized as string transformation, which is a task as follows. Given an 
input string, the system generates the k most likely output strings corresponding to the input 
string. This paper proposes a novel and probabilistic approach to string transformation, which 
is both accurate and efficient. The approach includes the use of a log linear model, a method 
for training the model, and an algorithm for generating the top k candidates, whether there is or 
is not a predefined dictionary. The log linear model is defined as a conditional probability 
distribution of an output string and a rule set for the transformation conditioned on an input 
string. The learning method employs maximum likelihood estimation for parameter estimation. 
The string generation algorithm based on pruning is guaranteed to generate the optimal top k 
candidates. The proposed method is applied to correction of spelling errors in queries as well 
as reformulation of queries in web search. Experimental results on large scale data show that 
the proposed approach is very accurate And efficient improving upon existing methods in 
terms of accuracy and efficiency in different settings. 
Architecture: 
IEEE PROJECTS & SOFTWARE DEVELOPMENTS 
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE 
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS 
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
EXISTING SYSTEM: 
Previous work on string transformation can be categorized into two groups. Some work 
mainly considered efficient generation of strings. Other work tried to learn the model with 
different approaches. However, efficiency is not an important factor taken into consideration 
in these methods.The existing work is not focus on enhancement of both accuracy and 
efficiency of string transformation. 
PROPOSED SYSTEM: 
String transformation has many applications in data mining, natural language processing, 
information retrieval, and bioinformatics. String transformation has been studied in different 
specific tasks such as database record matching, spell ing error correction, query reformulation 
and synonym mining. The major difference between our work and the existing work is that we 
focus on enhancement of both accuracy and efficiency of string transformation. 
Modules : 
1. Registration 
2. Login 
3. Spelling Error Correction 
4. String Transformation 
5. String mining
Modules Description 
Registration: 
In this module an Author(Owner) or User have to register first,then 
only he/she has to access the data base. 
Login: 
In this module,any of the above mentioned person have to login,they 
should login by giving their emailid and password . 
Spelling Error Correction: 
In this module if an user wants to check the spelling, He/She can check 
it and correct it automatically. 
String Transformation: 
Here we are techniques for searching the String 1)String 
Generation,2)String Transformation. 
String Generation: 
It means we have generated 50,000 Strings in alphabetical order.From a to z 
like a,aa,…..z. 
String Transformation:
It means we have given the user with the benefit of String Generation as well as 
String alias .It will be useful for the user for example if the end user have typed “TKDE” its 
equal to “Transactions 
on Knowledge and Data Engineering”. 
String mining: 
The User has to download the string with its meanings also He/She can 
download its substrings and its reverse etc.Also check the given string which is present in the 
bunch of strings,if its present the result will be “String Found” otherwise ”String NotFound”. 
System Configuration:- 
H/W System Configuration:- 
Processor - Pentium –III 
Speed - 1.1 GHz 
RAM - 256 MB (min) 
Hard Disk - 20 GB 
Floppy Drive - 1.44 MB 
Key Board - Standard Windows Keyboard 
Mouse - Two or Three Button Mouse
Monitor - SVGA 
S/W System Configuration:- 
 Operating System :Windows95/98/2000/XP 
 Application Server : Tomcat5.0/6.X 
 Front End : HTML, Java, Jsp 
 Scripts : JavaScript. 
 Server side Script : Java Server Pages. 
 Database : My sql 
 Database Connectivity : JDBC. 
Conclusion: 
In this paper, we have proposed a new statistical learning Approach to string transformation. 
Our method is novel and unique in its model, learning algorithm, and string generation 
algorithm. Two specific applications are addressed with our method, namely spelling error 
correction of queries and query reformulation in web Search. Experimental results on two large 
data sets and Microsoft Speller Challenge show that our method improves upon the baselines 
in terms of accuracy and efficiency. Our method is particularly useful when the-problem 
occurs on a large scale.

More Related Content

DOCX
A probabilistic approach to string transformation
DOCX
IEEE 2014 DOTNET DATA MINING PROJECTS A probabilistic approach to string tran...
PPT
Mining Product Reputations On the Web
PPT
Email Data Cleaning
PDF
Indexing for Large DNA Database sequences
DOC
Project Proposal Form
PDF
11.query optimization to improve performance of the code execution
PDF
Query optimization to improve performance of the code execution
A probabilistic approach to string transformation
IEEE 2014 DOTNET DATA MINING PROJECTS A probabilistic approach to string tran...
Mining Product Reputations On the Web
Email Data Cleaning
Indexing for Large DNA Database sequences
Project Proposal Form
11.query optimization to improve performance of the code execution
Query optimization to improve performance of the code execution

What's hot (17)

PPTX
Adaptive web page content identification
PPTX
Named Entity Recognition For Hindi-English code-mixed Twitter Text
PPTX
PPTX
Text classification with Weka
PDF
A novel approach towards developing a statistical dependent and rank
PDF
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
PDF
A Microservice Architecture for the Design of Computer-Interpretable Guidelin...
PDF
Evaluation of models for predicting user’s next request in web usage mining
PDF
Phenoflow 2021
PDF
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
PDF
MICRE: Microservices In MediCal Research Environments
PDF
Text Detection and Recognition: A Review
PDF
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
PDF
A unified approach for spatial data query
PPT
Scalable Discovery Of Hidden Emails From Large Folders
PDF
Parsimonious topic models with salient word discovery
PPTX
Int fold
Adaptive web page content identification
Named Entity Recognition For Hindi-English code-mixed Twitter Text
Text classification with Weka
A novel approach towards developing a statistical dependent and rank
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
A Microservice Architecture for the Design of Computer-Interpretable Guidelin...
Evaluation of models for predicting user’s next request in web usage mining
Phenoflow 2021
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
MICRE: Microservices In MediCal Research Environments
Text Detection and Recognition: A Review
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
A unified approach for spatial data query
Scalable Discovery Of Hidden Emails From Large Folders
Parsimonious topic models with salient word discovery
Int fold
Ad

Viewers also liked (9)

DOC
2014 IEEE JAVA CLOUD COMPUTING PROJECT Oruta privacy preserving public auditi...
PDF
Using Randomized Response Techniques for Privacy-Preserving Data Mining
PDF
Privacy preserving naive bayes classifier for horizontally partitioned data u...
PDF
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
PDF
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
PDF
F0423038041
PDF
Ijnsa050202
DOCX
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Oruta privacy preserving public auditi...
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Privacy preserving naive bayes classifier for horizontally partitioned data u...
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
F0423038041
Ijnsa050202
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
Ad

Similar to 2014 IEEE JAVA DATA MINING PROJECT A probabilistic approach to string transformation (20)

DOCX
2014 IEEE DOTNET DATA MINING PROJECT A probabilistic approach to string trans...
DOCX
JAVA 2013 IEEE DATAMINING PROJECT A probabilistic approach to string transfor...
PDF
C03504013016
PDF
Proposed Method for String Transformation using Probablistic Approach
PDF
An efficient approach to query reformulation in web search
PDF
EasyChair-Preprint-7375.pdf
PPT
4888009.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PDF
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
PPTX
Importance of String in Programming Languages.pptx
PDF
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings
PDF
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings
PDF
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings
PPTX
Bangla spell checker & suggestion generator
PPTX
Top k string similarity search
PPT
Lect 14 Zaheer Abbas
PDF
EmacsRedisplayAlgorithm
PDF
Rule Based Automatic Generation of Query Terms for SMS Based Retrieval Systems
PDF
Strings brief introduction in python.pdf
PPT
Designing A Syntax Based Retrieval System03
PPTX
Application of tries
2014 IEEE DOTNET DATA MINING PROJECT A probabilistic approach to string trans...
JAVA 2013 IEEE DATAMINING PROJECT A probabilistic approach to string transfor...
C03504013016
Proposed Method for String Transformation using Probablistic Approach
An efficient approach to query reformulation in web search
EasyChair-Preprint-7375.pdf
4888009.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Importance of String in Programming Languages.pptx
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings
Space Efficient Suffix Array Construction using Induced Sorting LMS Substrings
Bangla spell checker & suggestion generator
Top k string similarity search
Lect 14 Zaheer Abbas
EmacsRedisplayAlgorithm
Rule Based Automatic Generation of Query Terms for SMS Based Retrieval Systems
Strings brief introduction in python.pdf
Designing A Syntax Based Retrieval System03
Application of tries

More from IEEEFINALYEARSTUDENTPROJECT (20)

PDF
IEEE 2014-2015 DOTNET Projects GlobalSoft Technologies
PDF
2014 2015 ieee dotnet projects globalsoft technologies
PDF
2014 2015 ieee java projects globalsoft technologies
DOCX
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Web service recommendation via ex...
DOCX
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Securing brokerless publish subsc...
DOCX
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Secure outsourced-attribute-based...
DOCX
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Rre a-game-theoretic-intrusion-re...
DOCX
2014 IEEE JAVA IMAGE PROCESSING PROJECT Click prediction for web image rerank...
DOCX
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Automatic summarization of bug re...
DOCX
2014 IEEE JAVA SERVICE COMPUTING PROJECT Web service recommendation via explo...
DOCX
2014 IEEE JAVA SERVICE COMPUTING PROJECT Privacy enhanced web service composi...
DOCX
2014 IEEE JAVA SERVICE COMPUTING PROJECT Decentralized enactment of bpel proc...
DOCX
2014 IEEE JAVA SERVICE COMPUTING PROJECT A novel time obfuscated algorithm fo...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Web image re ranking using query-specific ...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Secure outsourced attribute based signatures
DOCX
2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...
DOC
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Data mining with big data
DOCX
2014 IEEE JAVA NETWORKING COMPUTING PROJECT Cost effective resource allocatio...
DOCX
2014 IEEE JAVA NETWORKING COMPUTING PROJECT Compact dfa scalable pattern matc...
IEEE 2014-2015 DOTNET Projects GlobalSoft Technologies
2014 2015 ieee dotnet projects globalsoft technologies
2014 2015 ieee java projects globalsoft technologies
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Web service recommendation via ex...
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Securing brokerless publish subsc...
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Secure outsourced-attribute-based...
2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Rre a-game-theoretic-intrusion-re...
2014 IEEE JAVA IMAGE PROCESSING PROJECT Click prediction for web image rerank...
2014 IEEE JAVA SOFTWARE ENGINEERING PROJECT Automatic summarization of bug re...
2014 IEEE JAVA SERVICE COMPUTING PROJECT Web service recommendation via explo...
2014 IEEE JAVA SERVICE COMPUTING PROJECT Privacy enhanced web service composi...
2014 IEEE JAVA SERVICE COMPUTING PROJECT Decentralized enactment of bpel proc...
2014 IEEE JAVA SERVICE COMPUTING PROJECT A novel time obfuscated algorithm fo...
2014 IEEE JAVA DATA MINING PROJECT Web image re ranking using query-specific ...
2014 IEEE JAVA DATA MINING PROJECT Secure outsourced attribute based signatures
2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Data mining with big data
2014 IEEE JAVA NETWORKING COMPUTING PROJECT Cost effective resource allocatio...
2014 IEEE JAVA NETWORKING COMPUTING PROJECT Compact dfa scalable pattern matc...

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
composite construction of structures.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Geodesy 1.pptx...............................................
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Well-logging-methods_new................
PPT
Mechanical Engineering MATERIALS Selection
Sustainable Sites - Green Building Construction
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CH1 Production IntroductoryConcepts.pptx
Lecture Notes Electrical Wiring System Components
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
OOP with Java - Java Introduction (Basics)
composite construction of structures.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf
Geodesy 1.pptx...............................................
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Well-logging-methods_new................
Mechanical Engineering MATERIALS Selection

2014 IEEE JAVA DATA MINING PROJECT A probabilistic approach to string transformation

  • 1. GLOBALSOFT TECHNOLOGIES A Probabilistic Approach to String Transformation Abstract: Many problems in natural language processing, data mining, information retrieval, and bioinformatics can be formalized as string transformation, which is a task as follows. Given an input string, the system generates the k most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top k candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top k candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate And efficient improving upon existing methods in terms of accuracy and efficiency in different settings. Architecture: IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
  • 2. EXISTING SYSTEM: Previous work on string transformation can be categorized into two groups. Some work mainly considered efficient generation of strings. Other work tried to learn the model with different approaches. However, efficiency is not an important factor taken into consideration in these methods.The existing work is not focus on enhancement of both accuracy and efficiency of string transformation. PROPOSED SYSTEM: String transformation has many applications in data mining, natural language processing, information retrieval, and bioinformatics. String transformation has been studied in different specific tasks such as database record matching, spell ing error correction, query reformulation and synonym mining. The major difference between our work and the existing work is that we focus on enhancement of both accuracy and efficiency of string transformation. Modules : 1. Registration 2. Login 3. Spelling Error Correction 4. String Transformation 5. String mining
  • 3. Modules Description Registration: In this module an Author(Owner) or User have to register first,then only he/she has to access the data base. Login: In this module,any of the above mentioned person have to login,they should login by giving their emailid and password . Spelling Error Correction: In this module if an user wants to check the spelling, He/She can check it and correct it automatically. String Transformation: Here we are techniques for searching the String 1)String Generation,2)String Transformation. String Generation: It means we have generated 50,000 Strings in alphabetical order.From a to z like a,aa,…..z. String Transformation:
  • 4. It means we have given the user with the benefit of String Generation as well as String alias .It will be useful for the user for example if the end user have typed “TKDE” its equal to “Transactions on Knowledge and Data Engineering”. String mining: The User has to download the string with its meanings also He/She can download its substrings and its reverse etc.Also check the given string which is present in the bunch of strings,if its present the result will be “String Found” otherwise ”String NotFound”. System Configuration:- H/W System Configuration:- Processor - Pentium –III Speed - 1.1 GHz RAM - 256 MB (min) Hard Disk - 20 GB Floppy Drive - 1.44 MB Key Board - Standard Windows Keyboard Mouse - Two or Three Button Mouse
  • 5. Monitor - SVGA S/W System Configuration:-  Operating System :Windows95/98/2000/XP  Application Server : Tomcat5.0/6.X  Front End : HTML, Java, Jsp  Scripts : JavaScript.  Server side Script : Java Server Pages.  Database : My sql  Database Connectivity : JDBC. Conclusion: In this paper, we have proposed a new statistical learning Approach to string transformation. Our method is novel and unique in its model, learning algorithm, and string generation algorithm. Two specific applications are addressed with our method, namely spelling error correction of queries and query reformulation in web Search. Experimental results on two large data sets and Microsoft Speller Challenge show that our method improves upon the baselines in terms of accuracy and efficiency. Our method is particularly useful when the-problem occurs on a large scale.