SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2652
A Novel Approch Automatically Categorizing Software Technologies
Khushang Mehta1, Prof. Kore Kunal Sidramappa2
1,2Sharadchandra Pawar College of Engineering, Otur, Pune, India
---------------------------------------------------------------------------***---------------------------------------------------------------------------
Abstract—Software development is increasingly based on
reusable components in the form of frames and libraries, as
well as programming languages and tools to use them.
Informal language and the absence of a standard taxonomy
for software technologies make it difficult to reliably analyze
technologi-cal trends in discussion forums and other online
sites. The system proposes an automatic approach called Witt
for the categorization of software technology. Witt takes as
input a sentence that describes a technology or a software
concept and returns a general category that describes it (for
example, an integrated development environment), along
with attributes that qualify it even more. By extension, the
approach allows the dynamic creation of lists of all
technologies of a given type.The system contribute
Levenshtein distance algorithm to compare similarities
between two stings.It work on character distances of two
strings.With this algorithm it is possible to categorize the data
from large data.
Index Terms—hypernym, Lexicography, NLP, Software
technologies.
I. INTRODUCTION
Now days the Software development is increasingly based
on reusable components platform in the form of frames and
libraries, as well as programming languages and tools to use
them. Taken together, these software technologies form a
massive and rapidly growing catalog of constituent elements
for systems that becomes difficult to monitor through dis-
cussion channels. The list of all technologies of a certain type
or their popularity in relation to this type. Questions like
”what is the most popular web application framework?” They
are important for many organizations, for example, to decide
which development tool to adopt at the beginning of a project
or for which technology to develop a driver. The answers to
these questions are routinely proposed without any
supporting data, but it is difficult to find valid empirical
surveys. To move to a rationalized, evidence-based approach
to monitor the use of software technologies, we must be able
to automatically classify and group the nominated mentions
of software technologies.
A. MOTIVATION
An important step towards understanding the terminology
of the machine is the discovery of hypernyms, that is, the
discovery of the more general concept in an is-a relationship
(for example, AngularJS is a web application framework),
which has led to the development of many tools hypernyms
automated extraction. Unfortunately, the discovery of correct
hypernyms is not efficient to support the detection and moni-
toring of comparable software techniques. For example, the
cross-platform commercial IDE for PHP is a hyper valid for
Php Storm, but the expression is too specific to make a useful
category of technologies. The categorization of software
technologies is a much more complex problem that requires
greater abstraction and normalization.
B. OBJECTIVE
To extract general categories and related attributes for the
hypernyms.To get data for user entered query.
II. REVIEW OF LITERATURE
1. Present natural language processing to extract
important concepts from identifiers defined in source code,
aggregating them into a WordNet-like structure that includes
their hypernyms relation [1].
2. Present One of the main limitations of WordNet for
software engineering applications is the lack of support for
specialized terminology. A number of projects have focused
on the design of lexical databases that include a word
similarity relationship. This relationship can be calculated
from the co-occurrences in the context of a forum publication.
[2]
3. Presented in a study on the use of labels in a book
management system, Treude and Storey discovered that
software developers had developed implicit and explicit
mechanisms for managing label vocabularies. [3].
4. Current system that calculates the textual similarity based
on the similarity of grams between the first paragraph of the
section of the selected article and the extract of the label. This
metric calculates the proportion of common token q
sequences between two strings. The result is a score between
0 (completely different) and 1 (exact copies). This system
chose the commonly used value q = 2 for this metric, with
words such as token. This system derived each word using
Porter Stemmer and did not consider the envelope of the
letter[4].
5. WordNet is a lexical database that contains, among other
information, hyperimone and hyperimone relationships. The
database was created manually and is considered a gold
standard in many language applications. With WordNet, the
system first retrieved all the words that matched the label.
This system eliminated a result if its brightness did not
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2653
contain at least one of the programming axes. For all the
remaining results, this system analyzed the words listed as
hypernotic in the result. This system considers all those
hypermitos for the evaluation. [5].
6. Present system that mine data from millions of questions
from the QA site Stack Overflow, and using a discriminative
model approach, this system automatically suggest question
tags to help a questioner choose appropriate tags for eliciting
a response[6].
7. Present a new algorithm to learn hypernyms relationships
(is-a) from the text, a key problem in machine learning for the
understanding of natural language. This method generalizes
the previous work that was based on synthetic-lexical
patterns constructed by hand, introducing a general
formalization of the space of patterns based on paths of
syntactic dependence[7].
8. Propose a semantic graph to model semantic correlations
between labels and words in software descriptions. Then,
according to the graph, this system designed an effective
algorithm to recommend labels for the software. With full
experiments on large scale open source software data sets
that compare them to different typical related jobs, this
system demonstrate the effectiveness and efficiency of our
method to recommend appropriate labels[8].
9. This paper proposes a simple and general technique to
automatically deduce semantically related words in the
software using the context of the words in the comments and
in the code. In addition, the Present system offers a
classification algorithm in the results of rPair and studies
pairs of crossed projects in two sets of software with similar
functionality, that is, multimedia browsers and operating
systems.
10. This paper addresses one of these main obstacles,
namely, the lack of adequate mechanisms of programmatic
access to the knowledge stored in these large semantic
knowledge bases. The current system features two
application programming interfaces for Wikipedia that are
specifically designed to extract lexical semantic information
enriched in knowledge bases and provide efficient and
structured access to available knowledge[10]
III. SYSTEM ARCHITECTURE/ SYSTEM OVERVIEW
In the proposed system, our approach takes as input a term
to categorize. As a vocabulary for the software technology
system, they have data of all the methodologies, so the system
gets the data labels. According to the label, they will obtain all
the data coming from a different technology. Apply NLP and
Levenshtein distance algorithm. Then hypernyms will find
like final step of the proposed system contains of
transforming the hypernyms into a set of categories, possibly
with some attributes. This system designed categories to
represent general hypernyms, with a focus on coverage:
commercial ide for php is a better (more precise) hypernyms
than ide, but the latter is a better category (higher coverage).
The attributes are meant to provide a flexible way to express
the information lost when transforming a hypernyms into a
category. They represent typical variants of the category, but
would not constitute valid hypernyms on their own. To
transform a hypernyms into category with attributes, this
system start by removing all non-informative phrases like
name of and type of this system also transform phrases
indicating a collection, e.g., set of, into the attribute collection
of, and remove it from the hypernyms. This system
constructed a small list of such phrases based on our
development set. If two or more occurrences of the word of
or of the word for remain in the hypernyms, this system do
not parse the hypernyms, as its structure is possibly too
complex for our simple heuristics.The system contribute Lev-
enshtein distance algorithm to compare similarities between
two stings.In information theory, and computer science, the
Levenshtein distance is a string metric for calculating the
difference between two sequences. Informally, the
Levenshtein distance between two words is the minimum
number of single-character edits (insertions, deletions or
substitutions) required to change one word into the other.
The application will work on computer science related
information.
Fig. 1. Proposed System Architecture
A. Algorithms
1) NLP:
Natural language processing (NLP) is a subfield of computer
science, information engineering and artificial intelligence
that deals with the interactions between computers and
(natural) human languages, especially how to program
computers to process and analyze large quantities of data. In
natural language.
2) Similarity score:
Calculate the string similar based on the similarity of the
grams Q between the first paragraph of the section of the
selected article and the extract of the label. The similarity is
calculated for the first line of both texts, then the first two
sentences, the first three, etc., till one of the inputs
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2654
parameter runs out of sentences. The best similarity score
is keep representative of the overall similarity between the
two inputs parameter.
3) Levenshtein distance algorithm:
The Levenshtein distance is a string metric to measure the
difference between two sequences. Informally, the
Levenshtein distance between two words is the mini-mum
number of single-character edits (i.e. insertions, deletions or
substitutions) required to change one word into the other.
The Levenshtein algorithm: calculates the least number of
edit operations that are necessary to modify one string to
obtain another string. The most common way of calculating
this is by the dynamic programming approach. In proposed
system Present system using this to match user entered
question with available question in database.
Input. : Get user entered question. Working:
Step1. Select user entered query
Step 2:. Select all data from available database
Step3. Pass the distance to match query question with
available data.
System will check question with according to entered
query with available data. word by word with available
answer.
Step4: One by one query will gets by visiting each data
to specified distance.
Output: Get matched similar data.
B. Mathematical Model
This will be used to calculate accuracy in proposed
system.It categorize query and result data from system. In the
field of information retrieval, precision is the fraction of
retrieved documents that are relevant to the query:
precision= relevantdocumentretrieveddocuments
retrieveddocuments
In information retrieval, recall is the fraction of the relevant
documents that are successfully retrieved.
recall= relevantdocumentretrieveddocuments
relevantdocuments
In binary classification, recall is called sensitivity. It can be
viewed as the probability that a relevant document is
retrieved by the query.
C. HARDWARE AND SOFTWARE REQUIREMENTS
Hardware Requirements
1) Processor - Intel i5 core
2) Speed - 1.1 GHz
3) RAM - 2GB
4) Hard Disk - 40 GB
5) Key Board - Standard Windows Keyboard
6) Mouse - Two or Three Button Mouse
7) Monitor - SVGA
Software Requirements
1) Operating System - XP, Windows7/8/10
2) Coding language - Java, MVC, JSP, HTML, CSS etc
3) Software - JDK1.7
4) Tool - Eclipse Luna
5) Server - Apache Tomcat 7.0
6) Database - MySQL 5.0
IV. SYSTEM ANALYSIS AND RESULT
Experimental setup Table 1-The proposed system string
categorize, it gives efficient time to categorize document
according to entered string.Fig.2-Graph showed a pictorial
rep-resentation of No.of matched document time. X-Axis
contains no.of document and y-axis time to match
query.Graph shows in proposed system how search time
varies with respect to the number of documents. in our
implementation, search time depends not only on the number
of documents returned, but also on the number of documents
in which the query to be categorize are present.
TABLE I
EXECUTION TIME FOR CATEGORIZE THE ENTERED QUERY
IN NO.OF DOCUMENTS.
Index
No.of
documents Query
Time to
catego-
rize(ms)
1 1000 Query1 101
2 2000 Query2 140
3 3000 Query3 190
V. CONCLUSION
In this paper, System proposed a novel a domain-specific
technique to automatically produce an attributed category
structure describing an input phrase assumed to be a
software technology. Here found that after trans-forming
hypernyms into more abstract categories. Our approach takes
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2655
as input a term to categorize software. It uses NLP and
Levenshtein distance algorithm.
Fig. 2. Categorize time for no.of documents.
ACKNOWLEDGMENT
The authors would like to thank the researchers as well as
publishers for making their resources available and teachers
for their guidance. We are thankful to the authorities of
Savitribai Phule University of Pune and concern members of
cPGCON2019 conference, organized by, for their constant
guidelines and support. We are also thankful to the reviewer
for their valuable suggestions. We also thank the college au-
thorities for providing the required infrastructure and
support. Finally, we would like to extend a heartfelt gratitude
to friends and family members.
REFERENCES
[1] J.-R. Falleri, M. Huchard, M. Lafourcade, and M. Dao,
Automatic extraction of a WordNet-like identifier network
from software, in 18th IEEE International Conference on
Program Comprehension (ICPC), 2010, pp. 413.
[2] SEWordSim: Software-specific word similarity database,
in Companion Proceedings of the 36th International
Conference on Software Engineer-ing, 2014
[3] C. Treude and M.-A. Storey, Work item tagging:
Communicating concerns in collaborative software
development, IEEE Transactions on Software Engineering,
vol. 38, no. 1, 2012.
[4] M. F. Porter, An algorithm for suffix stripping, Program,
vol. 14, no. 3, 1980.
[5] G. A. Miller, R. Beckwith, D. Gross, and K. J. Miller,
Introduction to Wordnet: An on-line lexical database,
International Journal of Lexicog-raphy, vol. 3, no. 4, 1990.
[6] A. K. Saha, R. K. Saha, A discriminative model approach for
suggesting tags automatically for Stack Overflow
questions, in Proceedings of the 10th Working Conference
on Mining Software Repositories, 2013, .
[7] R. Snow, D. Jurafsky, and A. Y. Ng, Learning Syntactic
Patterns for Automatic Hypernym Discovery, in
Proceedings of the 18th Annual Conference on Neural
Information Processing Systems, 2004.
[8] T. Wang, H. Wang, G. Yin, X. Li, and P. Zou, Tag
recommendation for open source software, Frontiers of
Computer Science, vol. 8, no. 1, , 2014.
[9] J. Yang and L. Tan, SWordNet: Inferring semantically
related words from software context, Empirical Software
Engineering, pp. 131, 2013.
[10] T. Zesch, C. Mller, Extracting lexical semantic knowledge
from wikipedia and wiktionary, in Proceedings of the
Conference on Language Resources and Evaluation, electronic
proceedings, 2008.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2656

More Related Content

PDF
An efficient tool for reusable software
PDF
Tracing Requirements as a Problem of Machine Learning
PDF
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
PDF
EXTRACTING ARABIC RELATIONS FROM THE WEB
PDF
IRJET - Recommendation System using Big Data Mining on Social Networks
PDF
Cohesive Software Design
PDF
Bug Triage: An Automated Process
PDF
Coverage-Criteria-for-Testing-SQL-Queries
An efficient tool for reusable software
Tracing Requirements as a Problem of Machine Learning
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
EXTRACTING ARABIC RELATIONS FROM THE WEB
IRJET - Recommendation System using Big Data Mining on Social Networks
Cohesive Software Design
Bug Triage: An Automated Process
Coverage-Criteria-for-Testing-SQL-Queries

What's hot (20)

PDF
IRJET- Review on Information Retrieval for Desktop Search Engine
PDF
WEB-BASED ONTOLOGY EDITOR ENHANCED BY PROPERTY VALUE EXTRACTION
PDF
Hybrid approach for generating non overlapped substring using genetic algorithm
PDF
A Heuristic Approach for Network Data Clustering
PDF
A web based approach: Acronym Definition Extraction
PDF
IRJET - Voice based Natural Language Query Processing
PDF
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
PDF
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
PDF
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
PDF
IRJET- Natural Language Query Processing
PDF
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
PDF
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
PDF
IRJET - Cyberbulling Detection Model
PDF
Aj35198205
PDF
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
PDF
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
PDF
Automatically inferring structure correlated variable set for concurrent atom...
PDF
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
PDF
IRJET- Analysis of Question and Answering Recommendation System
PDF
A NOVEL APPROACH TO ERROR DETECTION AND CORRECTION OF C PROGRAMS USING MACHIN...
IRJET- Review on Information Retrieval for Desktop Search Engine
WEB-BASED ONTOLOGY EDITOR ENHANCED BY PROPERTY VALUE EXTRACTION
Hybrid approach for generating non overlapped substring using genetic algorithm
A Heuristic Approach for Network Data Clustering
A web based approach: Acronym Definition Extraction
IRJET - Voice based Natural Language Query Processing
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IRJET- Natural Language Query Processing
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET - Cyberbulling Detection Model
Aj35198205
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
Automatically inferring structure correlated variable set for concurrent atom...
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
IRJET- Analysis of Question and Answering Recommendation System
A NOVEL APPROACH TO ERROR DETECTION AND CORRECTION OF C PROGRAMS USING MACHIN...
Ad

Similar to IRJET- A Novel Approch Automatically Categorizing Software Technologies (20)

PDF
IRJET- Deep Web Searching (DWS)
PDF
Image Based Tool for Level 1 and Level 2 Autistic People
PDF
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
PDF
Zemanta Tech Talk at Audible
PDF
Context Driven Technique for Document Classification
PDF
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
PPTX
Search, Signals & Sense: An Analytics Fueled Vision
PDF
NLP in Web Data Extraction (Omer Gunes)
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
Aq35241246
PDF
40120140501013
DOCX
facilitating document annotation using content and querying value
PDF
NATURE: A TOOL RESULTING FROM THE UNION OF ARTIFICIAL INTELLIGENCE AND NATURA...
PDF
Quality, quantity, web and semantics
PDF
Quality, Quantity, Web and Semantics
PDF
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
PPTX
Automated Software Requirements Labeling
PPTX
Data Day Seattle, From NLP to AI
PDF
Automated building of taxonomies for search engines
IRJET- Deep Web Searching (DWS)
Image Based Tool for Level 1 and Level 2 Autistic People
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
Zemanta Tech Talk at Audible
Context Driven Technique for Document Classification
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
Search, Signals & Sense: An Analytics Fueled Vision
NLP in Web Data Extraction (Omer Gunes)
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Aq35241246
40120140501013
facilitating document annotation using content and querying value
NATURE: A TOOL RESULTING FROM THE UNION OF ARTIFICIAL INTELLIGENCE AND NATURA...
Quality, quantity, web and semantics
Quality, Quantity, Web and Semantics
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
Automated Software Requirements Labeling
Data Day Seattle, From NLP to AI
Automated building of taxonomies for search engines
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
web development for engineering and engineering
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Mechanical Engineering MATERIALS Selection
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Welding lecture in detail for understanding
UNIT 4 Total Quality Management .pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Structs to JSON How Go Powers REST APIs.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
OOP with Java - Java Introduction (Basics)
web development for engineering and engineering
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Mechanical Engineering MATERIALS Selection
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
bas. eng. economics group 4 presentation 1.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Welding lecture in detail for understanding

IRJET- A Novel Approch Automatically Categorizing Software Technologies

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2652 A Novel Approch Automatically Categorizing Software Technologies Khushang Mehta1, Prof. Kore Kunal Sidramappa2 1,2Sharadchandra Pawar College of Engineering, Otur, Pune, India ---------------------------------------------------------------------------***--------------------------------------------------------------------------- Abstract—Software development is increasingly based on reusable components in the form of frames and libraries, as well as programming languages and tools to use them. Informal language and the absence of a standard taxonomy for software technologies make it difficult to reliably analyze technologi-cal trends in discussion forums and other online sites. The system proposes an automatic approach called Witt for the categorization of software technology. Witt takes as input a sentence that describes a technology or a software concept and returns a general category that describes it (for example, an integrated development environment), along with attributes that qualify it even more. By extension, the approach allows the dynamic creation of lists of all technologies of a given type.The system contribute Levenshtein distance algorithm to compare similarities between two stings.It work on character distances of two strings.With this algorithm it is possible to categorize the data from large data. Index Terms—hypernym, Lexicography, NLP, Software technologies. I. INTRODUCTION Now days the Software development is increasingly based on reusable components platform in the form of frames and libraries, as well as programming languages and tools to use them. Taken together, these software technologies form a massive and rapidly growing catalog of constituent elements for systems that becomes difficult to monitor through dis- cussion channels. The list of all technologies of a certain type or their popularity in relation to this type. Questions like ”what is the most popular web application framework?” They are important for many organizations, for example, to decide which development tool to adopt at the beginning of a project or for which technology to develop a driver. The answers to these questions are routinely proposed without any supporting data, but it is difficult to find valid empirical surveys. To move to a rationalized, evidence-based approach to monitor the use of software technologies, we must be able to automatically classify and group the nominated mentions of software technologies. A. MOTIVATION An important step towards understanding the terminology of the machine is the discovery of hypernyms, that is, the discovery of the more general concept in an is-a relationship (for example, AngularJS is a web application framework), which has led to the development of many tools hypernyms automated extraction. Unfortunately, the discovery of correct hypernyms is not efficient to support the detection and moni- toring of comparable software techniques. For example, the cross-platform commercial IDE for PHP is a hyper valid for Php Storm, but the expression is too specific to make a useful category of technologies. The categorization of software technologies is a much more complex problem that requires greater abstraction and normalization. B. OBJECTIVE To extract general categories and related attributes for the hypernyms.To get data for user entered query. II. REVIEW OF LITERATURE 1. Present natural language processing to extract important concepts from identifiers defined in source code, aggregating them into a WordNet-like structure that includes their hypernyms relation [1]. 2. Present One of the main limitations of WordNet for software engineering applications is the lack of support for specialized terminology. A number of projects have focused on the design of lexical databases that include a word similarity relationship. This relationship can be calculated from the co-occurrences in the context of a forum publication. [2] 3. Presented in a study on the use of labels in a book management system, Treude and Storey discovered that software developers had developed implicit and explicit mechanisms for managing label vocabularies. [3]. 4. Current system that calculates the textual similarity based on the similarity of grams between the first paragraph of the section of the selected article and the extract of the label. This metric calculates the proportion of common token q sequences between two strings. The result is a score between 0 (completely different) and 1 (exact copies). This system chose the commonly used value q = 2 for this metric, with words such as token. This system derived each word using Porter Stemmer and did not consider the envelope of the letter[4]. 5. WordNet is a lexical database that contains, among other information, hyperimone and hyperimone relationships. The database was created manually and is considered a gold standard in many language applications. With WordNet, the system first retrieved all the words that matched the label. This system eliminated a result if its brightness did not
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2653 contain at least one of the programming axes. For all the remaining results, this system analyzed the words listed as hypernotic in the result. This system considers all those hypermitos for the evaluation. [5]. 6. Present system that mine data from millions of questions from the QA site Stack Overflow, and using a discriminative model approach, this system automatically suggest question tags to help a questioner choose appropriate tags for eliciting a response[6]. 7. Present a new algorithm to learn hypernyms relationships (is-a) from the text, a key problem in machine learning for the understanding of natural language. This method generalizes the previous work that was based on synthetic-lexical patterns constructed by hand, introducing a general formalization of the space of patterns based on paths of syntactic dependence[7]. 8. Propose a semantic graph to model semantic correlations between labels and words in software descriptions. Then, according to the graph, this system designed an effective algorithm to recommend labels for the software. With full experiments on large scale open source software data sets that compare them to different typical related jobs, this system demonstrate the effectiveness and efficiency of our method to recommend appropriate labels[8]. 9. This paper proposes a simple and general technique to automatically deduce semantically related words in the software using the context of the words in the comments and in the code. In addition, the Present system offers a classification algorithm in the results of rPair and studies pairs of crossed projects in two sets of software with similar functionality, that is, multimedia browsers and operating systems. 10. This paper addresses one of these main obstacles, namely, the lack of adequate mechanisms of programmatic access to the knowledge stored in these large semantic knowledge bases. The current system features two application programming interfaces for Wikipedia that are specifically designed to extract lexical semantic information enriched in knowledge bases and provide efficient and structured access to available knowledge[10] III. SYSTEM ARCHITECTURE/ SYSTEM OVERVIEW In the proposed system, our approach takes as input a term to categorize. As a vocabulary for the software technology system, they have data of all the methodologies, so the system gets the data labels. According to the label, they will obtain all the data coming from a different technology. Apply NLP and Levenshtein distance algorithm. Then hypernyms will find like final step of the proposed system contains of transforming the hypernyms into a set of categories, possibly with some attributes. This system designed categories to represent general hypernyms, with a focus on coverage: commercial ide for php is a better (more precise) hypernyms than ide, but the latter is a better category (higher coverage). The attributes are meant to provide a flexible way to express the information lost when transforming a hypernyms into a category. They represent typical variants of the category, but would not constitute valid hypernyms on their own. To transform a hypernyms into category with attributes, this system start by removing all non-informative phrases like name of and type of this system also transform phrases indicating a collection, e.g., set of, into the attribute collection of, and remove it from the hypernyms. This system constructed a small list of such phrases based on our development set. If two or more occurrences of the word of or of the word for remain in the hypernyms, this system do not parse the hypernyms, as its structure is possibly too complex for our simple heuristics.The system contribute Lev- enshtein distance algorithm to compare similarities between two stings.In information theory, and computer science, the Levenshtein distance is a string metric for calculating the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. The application will work on computer science related information. Fig. 1. Proposed System Architecture A. Algorithms 1) NLP: Natural language processing (NLP) is a subfield of computer science, information engineering and artificial intelligence that deals with the interactions between computers and (natural) human languages, especially how to program computers to process and analyze large quantities of data. In natural language. 2) Similarity score: Calculate the string similar based on the similarity of the grams Q between the first paragraph of the section of the selected article and the extract of the label. The similarity is calculated for the first line of both texts, then the first two sentences, the first three, etc., till one of the inputs
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2654 parameter runs out of sentences. The best similarity score is keep representative of the overall similarity between the two inputs parameter. 3) Levenshtein distance algorithm: The Levenshtein distance is a string metric to measure the difference between two sequences. Informally, the Levenshtein distance between two words is the mini-mum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. The Levenshtein algorithm: calculates the least number of edit operations that are necessary to modify one string to obtain another string. The most common way of calculating this is by the dynamic programming approach. In proposed system Present system using this to match user entered question with available question in database. Input. : Get user entered question. Working: Step1. Select user entered query Step 2:. Select all data from available database Step3. Pass the distance to match query question with available data. System will check question with according to entered query with available data. word by word with available answer. Step4: One by one query will gets by visiting each data to specified distance. Output: Get matched similar data. B. Mathematical Model This will be used to calculate accuracy in proposed system.It categorize query and result data from system. In the field of information retrieval, precision is the fraction of retrieved documents that are relevant to the query: precision= relevantdocumentretrieveddocuments retrieveddocuments In information retrieval, recall is the fraction of the relevant documents that are successfully retrieved. recall= relevantdocumentretrieveddocuments relevantdocuments In binary classification, recall is called sensitivity. It can be viewed as the probability that a relevant document is retrieved by the query. C. HARDWARE AND SOFTWARE REQUIREMENTS Hardware Requirements 1) Processor - Intel i5 core 2) Speed - 1.1 GHz 3) RAM - 2GB 4) Hard Disk - 40 GB 5) Key Board - Standard Windows Keyboard 6) Mouse - Two or Three Button Mouse 7) Monitor - SVGA Software Requirements 1) Operating System - XP, Windows7/8/10 2) Coding language - Java, MVC, JSP, HTML, CSS etc 3) Software - JDK1.7 4) Tool - Eclipse Luna 5) Server - Apache Tomcat 7.0 6) Database - MySQL 5.0 IV. SYSTEM ANALYSIS AND RESULT Experimental setup Table 1-The proposed system string categorize, it gives efficient time to categorize document according to entered string.Fig.2-Graph showed a pictorial rep-resentation of No.of matched document time. X-Axis contains no.of document and y-axis time to match query.Graph shows in proposed system how search time varies with respect to the number of documents. in our implementation, search time depends not only on the number of documents returned, but also on the number of documents in which the query to be categorize are present. TABLE I EXECUTION TIME FOR CATEGORIZE THE ENTERED QUERY IN NO.OF DOCUMENTS. Index No.of documents Query Time to catego- rize(ms) 1 1000 Query1 101 2 2000 Query2 140 3 3000 Query3 190 V. CONCLUSION In this paper, System proposed a novel a domain-specific technique to automatically produce an attributed category structure describing an input phrase assumed to be a software technology. Here found that after trans-forming hypernyms into more abstract categories. Our approach takes
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2655 as input a term to categorize software. It uses NLP and Levenshtein distance algorithm. Fig. 2. Categorize time for no.of documents. ACKNOWLEDGMENT The authors would like to thank the researchers as well as publishers for making their resources available and teachers for their guidance. We are thankful to the authorities of Savitribai Phule University of Pune and concern members of cPGCON2019 conference, organized by, for their constant guidelines and support. We are also thankful to the reviewer for their valuable suggestions. We also thank the college au- thorities for providing the required infrastructure and support. Finally, we would like to extend a heartfelt gratitude to friends and family members. REFERENCES [1] J.-R. Falleri, M. Huchard, M. Lafourcade, and M. Dao, Automatic extraction of a WordNet-like identifier network from software, in 18th IEEE International Conference on Program Comprehension (ICPC), 2010, pp. 413. [2] SEWordSim: Software-specific word similarity database, in Companion Proceedings of the 36th International Conference on Software Engineer-ing, 2014 [3] C. Treude and M.-A. Storey, Work item tagging: Communicating concerns in collaborative software development, IEEE Transactions on Software Engineering, vol. 38, no. 1, 2012. [4] M. F. Porter, An algorithm for suffix stripping, Program, vol. 14, no. 3, 1980. [5] G. A. Miller, R. Beckwith, D. Gross, and K. J. Miller, Introduction to Wordnet: An on-line lexical database, International Journal of Lexicog-raphy, vol. 3, no. 4, 1990. [6] A. K. Saha, R. K. Saha, A discriminative model approach for suggesting tags automatically for Stack Overflow questions, in Proceedings of the 10th Working Conference on Mining Software Repositories, 2013, . [7] R. Snow, D. Jurafsky, and A. Y. Ng, Learning Syntactic Patterns for Automatic Hypernym Discovery, in Proceedings of the 18th Annual Conference on Neural Information Processing Systems, 2004. [8] T. Wang, H. Wang, G. Yin, X. Li, and P. Zou, Tag recommendation for open source software, Frontiers of Computer Science, vol. 8, no. 1, , 2014. [9] J. Yang and L. Tan, SWordNet: Inferring semantically related words from software context, Empirical Software Engineering, pp. 131, 2013. [10] T. Zesch, C. Mller, Extracting lexical semantic knowledge from wikipedia and wiktionary, in Proceedings of the Conference on Language Resources and Evaluation, electronic proceedings, 2008.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 2656