SlideShare a Scribd company logo
Rule based Production Systems for Automatic Code Generation in Java



                         Imran Sarwar Bajwa1, M. Imran Siddique2 and M. Abbas Choudhary3
                                     Department of Computer Sciecne and Information Technology
                                           The Islamia University of Bahawalpur, Pakistan
                                                        bajwa@buitms.edu.pk

                                      2 Facuty of Computer an EmergingSciences
                      3 Balochistan University of Information Technology and Management Sciences
                                                     Quetta, Pakistan.
                                             imran@buitms.edu.pk
                                              abbas@buitms.edu.pk


                                    Abstract                                                     1. Introduction
   Unified modeling language is being used as a premier                                           In the current age, the tools and techniques of software en-
 tool for modeling the user requirements. These CASE                                             gineering has been changed to an adequate extent. Now
 tools provide an easy way to get efficient solutions. This                                      every step of software engineering follows the rules of
 paper presents a natural language processing based                                              object oriented design patterns. Same the case is with Soft-
 automated system for generating code in multi-languages                                         ware process which uses Unified Modeling language for
 after modeling the user requirements based on UML. UML                                          modeling the user requirements. In recent times, there is no
 diagrams are first generated by analyzing the given busi-                                       software which provides services to draw UML diagrams
 ness scenario provided by the user. A new model is presen-                                      more efficiently except Rational Rose, Smart Draw etc and
 ted for analyzing the natural languages and extracting the                                      there is no doubt that these are reasonably good software
 relative and required information from the given re-                                            but has many drawbacks.
 quirement notes by the user. User writes the requirements
 in simple English in a few paragraphs and the designed                                           Conventionally, the system analyst has to do a lot of work
 system has conspicuous ability to analyze the given script.                                     for deducing the business logic and understanding the user
 After compound analysis and extraction of associated                                            requirements before drawing the UML diagrams by using
 information, the designed system draws various UML                                              orthodox CASE tools. Hence, there is wastage of so much
 diagrams as activity diagrams, sequence diagrams, class                                         time due to the dull nature of the available CASE tools for
 diagrams and Uses cases diagrams. The designed system                                           the required scenario. In today’s world everybody needs a
 has robust ability to create code automatically without ex-                                     quick and reliable service. So it was needed that there
 ternal environment. The designed system provides a quick                                        should be some sort of intelligent software for generating
 and reliable way to generate UML diagrams and generate                                          UML based documentation to save time and budget of both
 respective code to save the time and budget of both the                                         the user and system analyst.
 user and system analyst.                                                                         In order to resolve all such issues, we need software, which
                                                                                                 facilitates both users and software engineers. As far as this
                                                                                                 software is concerns the time, it takes to explore all the fa-
                                                                                                 cilities and services, should be quite less than a minute and
                                                                                                 this information is quite useful for the users.




      1-4244-0682-X/06/$20.00 ©2006 IEEE                                                  300


Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
2. Problem Description
   The problem specifically addressed in this research is                                        flavours as English language has more than half dozen
 primarily related to the software analysis and design phase                                     renowned flavours all over the world. These flavours have
 of the software development process. Few years ago data                                         different accents, set of vocabularies and phonological as-
 flow diagram’s were being used to symbolize the flow of                                         pects. These ominous and menacing discrepancies and in-
 data and represent the user’s requirements. But in current                                      consistencies in natural languages make it a difficult task
 age, unified modeling language is used to model and map                                         to process them as compared to the formal languages [13].
 the user requirements, which is more comprehensive e and                                         In the process of analyzing and understanding the natural
 authentic way to of representation and it is beneficial for                                     languages, various problems are usually faced by the
 the later stages of software development.                                                       researchers. The problems connected to the greater
  After modelling and mapping the user requirements, next                                        complexity of the natural language are verb’s conjugation,
 phase is to create the programming code in certain                                              inflexion, lexical amplitude, problem of ambiguity, etc.
 computer language. Hand written code by a programmer is                                         From this set of problems the problem which ever causes
 a conventional and orthodox solution of the problem which                                       more difficulties is problem of ambiguity. Ambiguity could
 is time consuming. Modern software engineering requires                                         be easily solved at the syntax and semantic level by using a
 quick and automated solutions which may have ability to                                         sound and robust rule-based system.
 create more than the half code, so that the programmer may
 create the application after making appropriate adjustments                                     5. Object-Oriented Analysis and Design
 and alterations in the automated generated code with less                                        Analysis and design of an information system relates to un-
 effort in less time as compared to the traditional                                              derstand and intend the framework to accomplish the actual
 approaches.                                                                                     job. Typically, design is relates to manage and control the
                                                                                                 complexity parameter in a domain. A robust design
  3. Problem’s Solution                                                                          method also helps to split big tasks into controllable break-
  The conducted research provides a robust solution to the                                       ups (Condamines, 2001). In software engineering, design
 addressed problem. Multi-lingual Code Generator (MCG)                                           methods provide various notation usually graphical ones.
 provides the solution of the problem. The functionality of                                      These notations allow to store and communicate the perpetu-
 the conducted research was domain specific but it can be                                        al design decisions. Object-oriented design has overruled the
 enhanced easily in the future according to the require-                                         typical analysis and design techniques as structured design
 ments. Current designed system incorporate the capability                                       and data-driven design (Androutsopoulos, 1995). As com-
 of mapping user requirements after reading the given                                            pared to old style design paradigms, object-oriented design
                                                                                                 models the every active entity of the problem domain using
 requirements in plain text and drawing the set of UML dia-
                                                                                                 concept of objects. Objects have:
 grams as Class Diagram, Activity Diagram, Sequence
 Diagram, Use case diagram and Component Diagram.                                                      State (shape and condition)
                                                                                                       Behaviour (What they perform)
  After drawing UML diagrams, designed system has pro-
 found ability to create code in various languages as Java,                                       Object-oriented languages use variable to manifest the state
 Visual Basic, C # and C++. An Integrated Development                                            of an object and methods or procedures to implement the
 Environment has also been provided for User Interaction                                         behaviour of an object. For example, a ball could be an ob-
 and efficient Input and output.                                                                 ject. There are different parameters of shape as colour, size,
                                                                                                 diameter, shape, type, etc. This object can also have beha-
  4. Natural Language Processing                                                                 viour as throw, roll, catch, hit, etc. The major task in analysis
                                                                                                 and design phase is to identify the valid objects and specify
  The understanding and multi-aspect processing of the nat-                                      there states and behaviours. In conventional methods, system
 ural languages that are also termed as "speech languages",                                      analyst performs this tough job and then maps this
 is actually one of the arguments of greater interest in the                                     information into UML using some graphical tool as Visio or
 field artificial intelligence field [6]. The natural languages                                  Rational Rose.
 are irregular and asymmetrical. Traditionally, natural
 languages are based on un-formal grammars. There are the                                         In the context of this research, objects are automatically
 geographical, psychological and sociological factors which                                      identified from a problem domain. User provides the input
 influence the behaviours of natural languages [17]. There                                       text in English language related to the business domain.
 are undefined set of words and they also change and vary                                        After the lexical analysis of the text, syntax analysis is
 area to area and time to time. Due to these variations and                                      performed on word level to recognize the word category [4].
 inconsistencies, the natural languages have differnet                                           First of all the available lexicons are categorised i nto




                                                                                          301


Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
nouns, pronouns, prepositions, adverbs, articles, conjunc-
                                                                                                    Co-actor: The word with may introduce a join phrase
 tions, etc. The syntactic analysis of the programs would
                                                                                                    that serves a partner in the principal actor. They two
 have to be in a position to isolate subject, verbs, objects,
                                                                                                    carry out the action together "Zahid played tennis with
 adverbs, adjectives and various other complements. It is
                                                                                                    Ali."
 little complex and multipart procedure.
 "Zia is playing with the red ball."                                                                Recipient: The recipient is the person for whom an
                                                                                                    action has bee performed: "Ali bought the balls for
 For this example, following is the output.                                                         Zahid." In this sentence Suzie is beneficiary.
 Lexicons                       Phase-I                             Phase –II                       Thematic object: The thematic object is the object the
                                                                                                    sentence is really all about. Often the thematic object is
 Zia                           Noun                                 Object
                                                                                                    the same as the syntactic direct object, as "Zahid hit the
 is                            Helping-Verb                         -------
                                                                                                    ball." Here the ball is thematic object.
 playing                       Verb                                 Method
 with                          Preposition                          -------                         Transportation: The transportation is something in
 the                           Article                              -------                         which or on which travels: “Aslam always goes by
 red                           Noun                                 Attribute                       train."
 ball                          Noun                                 Object
                                                                                                    Path: Motion from source to destination takes place
  This is the final output of lexical assessment phase and                                          over at path. Several prepositions can serve to introduce
 all nouns are marked as objects and verbs are marked as                                            trajectory noun phrases: "Ali and Zahid went to
 methods and all adjective are marked as states of that par-                                        Islamabad through Lahore."
 ticular object. In the above example, there are two objects
 ‘Zia’ and ball. ‘Playing’ is method of ‘Zia’ and ‘red’ is the                                      Position: The position is where an action occurs. As in
 concerned attribute of the object ‘ball’.                                                          the path role, "several prepositions are possible, which
                                                                                                    conveys means in addition of serving as a signal that a
                                                                                                    position noun phrase is "Aslam and Ahmed studied in
  6. Used Methodology                                                                               the library, at a desk, by the wall, near the door."
  Conventional natural language processing based systems
                                                                                                    Period: Period specifies when an action occurs.
 user rule based systems. Agents are another way to address
                                                                                                    Prepositions such at, before, and after introduce noun
 this problem[8]. In the research, a rule-based algorithm has
                                                                                                    phrases serving as time role fill "Ahmed and Ali left
 been used which has robust ability to read, understand and
                                                                                                    before noon."
 extract the desired information. First of all basic elements
 of the language grammar are extracted as verbs, nouns, ad-                                         Interval: Interval specifies how long an action takes.
 jectives, etc then on the basis of this extracted information                                      Preposition such as for indicate duration. "Aslam and
 further processing is performed. In linguistic terms, verbs                                        Ahmed jogged for an hour.”
 often specify actions, and noun phrases the objects that
 participate in the action [16]. Each noun phrase's then role                                    7. Architecture of Designed System
 specifies how the object participates in the action. As in the
                                                                                                  The designed UMLG system has ability to draw UML
 following example:
                                                                                                diagrams after reading the text scenario provided by the
    "Robbie hit a ball with a racket."                                                          user. This system draws diagrams in five modules: Text
                                                                                                input acquisition, text understanding, knowledge extrac-
  A procedure that understands such a sentence must dis-
                                                                                                tion, generation of UML diagrams and finally multi-lingual
 cover that Role is the agent because he performs the action
                                                                                                code generation as shown in following fig.
 of hitting, that the ball as the thematic object because it is
 the object hit, and that the racket is an instrument because                                                                          Text
 it is the tool with which hitting is done. Thus, sentence
 analysis requires, in part, the answers to these actions: The                                                             Text Input Acquisition
 number of thematic roles embraced by various theories
 varies probably. Some people use about half-dozen
 thematic roles [9]. Others use more times as many. The ex-
 act number does not matter much, as long as they will great                                                                 Text Understanding
 enough to expose natural constraints on how verbs and
 thematic-instances form sentences.
    Actor: The actor causes the action to occur as in "Zahid
                                                                                                                           Knowledge Extraction
    hits the ball," Zahid is actor who performs the task.



                                                                                          302


Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
UML Diagrams
                                                                                                 7.4. UML diagram generation
                                                                                                  This module finally uses UML symbols to constitute the
                                                                                                 various UML diagrams by combining available symbols
                         Multi-lingual Code Generation
                                                                                                 according to the information extracted of the previous
                                                                                                 module. As separate scenario will be provided for various
                                          Code                                                   diagrams as classes, sequence, activity and use cases dia-
                                                                                                 grams, so the separate functions are implemented for re-
   Figure 1: Architecture of the designed MCG system                                             spective diagram.

  7.1. Text input acquisition
   This module helps to acquire input text scenario. User
  provides the business scenario in from of paragraphs of the
  text. This module reads the input text in the form
  characters and generates the words by concatenating the
  input characters. This module is the implementation of the
  lexical phase. Lexicons and tokens are generated in this
  module.
  7.2. Text Understanding
   This module reads the input from module 1 in the from of
  words. These words are categorized into various classes as
  verbs, helping verbs, nouns, pronouns, adjectives, preposi-
  tions, conjunctions, etc.
  7.3. Knowledge extraction
   This module, extracts different objects and classes and their
  respective attributes on the basses of the input provided by
  the preceding module. Nouns are symbolized as classes and                                                   Figure 3: Generating class diagrams
  objects and their associated attributes are termed as
  attributes.                                                                                    7.5. Multi-lingual Code generation
                                                                                                  This is the last module, which ultimately generate code in
                                                                                                 the different popular languages as Java, C#.Net and
                                                                                                 VB.Net to help the programmer. The generated code is
                                                                                                 structured according to the knowledge extracted in the pre-
                                                                                                 vious modules and the UML diagrams generated.




     Figure 2. Extracting classes, functions and attributes                                           Figure 4: Generating Code in Java Language

                                                                                          303


Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
8. Accuracy Evaluation                                                                         that he has followed the prerequisites of the software to
                                                                                                 prepare the input scenario. The given scenario should be
  To test the accuracy of the diagrams generated by the                                          complete and written in simple and correct English. Under
 designed system four parameters had been decided. Each                                          the scope of our project, software will perform a complete
 generated diagram from each category was checked. Max-                                          analysis of the scenario to find the classes, their attributes
 imum score was declared 25. According to the wrong nom-                                         and operations. It will also draw the diagrams as class dia-
 inations and extractions, the points were detected. A matrix                                    grams, activity diagrams, use-case diagrams and sequence
 of results of generated diagrams is shown below.                                                diagrams.

   Table 1 - Testing results of different UML Diagrams                                           An elegant graphical user interface has also been provided
                                                                                                 to the user for entering the Input scenario in a proper way
                                                                                                 and generating UML diagrams.
  Dig. Types Objects Attributes Sequence labeling Total

    Class              22             24              20             19        85%               10.Future Work
    Activity           23             21              16             20        80%                The designed system for generating UML diagrams and
                                                                                                 their respective code in multiple languages was started
    Sequence           21             22              13             18        74%
                                                                                                 with the aims that there should be a software which can
    Use case           21             24              21             22        88%               read the scenario given in English language and can draw
                                                                                                 the all types of the UML diagrams such as Class diagram,
                                                                                                 activity diagram, sequence diagram, use case diagram,
  A matrix representing UML diagrams accuracy test (%)
                                                                                                 component diagram, deployment diagram. But last two of
 for class, activity, sequence and use case diagrams has
                                                                                                 them component diagram, deployment diagram are still un-
 been constructed. Overall diagrams accuracy for all types
                                                                                                 touched.
 of UML diagrams is determined by adding total accuracy
 of all categories and calculating average of it. Following                                       There is also some margin of improvements in the
 graph is showing the accuracy ratio of various diagram                                          algorithms for generating first four types Class diagram,
 types in terms of objects, attributes, sequence and labeling                                    activity diagram, sequence diagram, use case diagram.
 parameters.                                                                                     Current accuracy of generating diagrams is about 80% to
                                                                                                 85%. It can be enhanced up to 95% by improving the
                                                                                                 algorithms and inducing the ability of learning in the sys-
                                                                                                 tem.

                                                                                                 11.References
                                                                                                 [1] Allen,J. (1994) Natural Language Understanding.
                                                                                                 Benjamin- Cummings Publishing Company, New York.
                                                                                                 [2] Biber, D., Conrad, S., & Reppen, R. (1998). Corpus
                                                                                                 Linguistics: Investigating Language Structure and Use.
                                                                                                 Cambridge Univ. Press, Cambridge, U.K.
                                                                                                 [3] Blaschke,C., Andrade,M.A., Ouzounis,C. and
                                                                                                 Valencia,A. (1999) Automatic extraction of biological
    Figure 5: Accuracy Ratio of various diagram types                                            information from scientific text: protein–protein
                                                                                                 interactions. Ismb, 60–67.
  9. Conclusion                                                                                  [4] C. A. Thompson, R. J. Mooney and L. R. Tang,
                                                                                                 Learning to parse natural language database queries into
  This research is all about the dynamic generation of the                                       logical form, in: Workshop on Automata Induction,
 UML diagrams and their respective code by reading and                                           Grammatical Inference and Language Acquisition (1997).
 analyzing the given scenario in English language provided by
 the user. The designed system can find out the classes and                                      [5] Chomsky, N. (1959). On certain formal properties of
 objects and their attributes and operations using an artificial                                 grammars. Information and Control, 2(2), 137–167.
 intelligence technique such as natural language processing.                                     [6] Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT
 Then the UML diagrams such as Activity dig., Sequence                                           Press, Cambridge, Mass. Chow, C., & Liu, C. (1968).
 dig., Component dig., Use Case dig., etc would be drawn.                                        Approximating discrete probability distributions with
 The accuracy of the software is expected up to about 80%                                        dependence trees. IEEE Transactions on Information Theory,
 with the involvement of the software engineer provided                                          IT-14(3), 462–467.




                                                                                          304


Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
[7] Fagan, J. L. (1989). The effectiveness of a non-syntactic
 approach to automatic phrase indexing for document retrieval.
 Journal of the American Society for Information Science, 40(2),
 115–132.
 [8] J. M. Zelle and R. J. Mooney, Learning semantic
 grammars with constructive inductive logic programming,
 in: Proceedings of the 11th National Conference on
 Arti_cial Intelligence (AAAI Press/MIT Press,
 Washington, D.C., 1993), pp. 817ñ822.
 [9] Kowalski, G. (1998). Information Retrieval Systems: Theory
 and Implementation. Kluwer, Boston.
 [10] Krovetz, R., & Croft, W. B. (1992). Lexical ambiguity and
 information retrieval. ACM Transactions on Information
 Systems, 10, 115–141.
 [11] Losee, R. M. (1988). Parameter estimation for probabilistic
 document retrieval models. Journal of the American Society for
 Information Science, 39(1), 8–16.
 [12] Losee, R. M. (1996a). Learning syntactic rules and tags
 with genetic algorithms for information retrieval and filtering:
 An empirical basis for grammatical rules. Information
 Processing and Management, 32(2), 185–197.
 [13] Manning, C. D., & Schutze, H. (1999). Foundations of
 Statistical Natural Language Processing. MIT Press,
 Cambridge, Mass.
 [14] Maron, M. E.,& Kuhns, J. L. (1960). On relevance,
 probabilistic indexing, and information retrieval. Journal of the
 ACM, 7, 216–244.
 [15] Partee, B. H., Meulen, A. t., &Wall, R. E. (1990).
 Mathematical Methods in Linguistics. Kluwer, Dordrecht, The
 Netherlands.
 [16] Salton, G., & McGill, M. (1983). Introduction to Modern
 Information Retrieval. McGraw-Hill, New York.
 [17] S. Weiss, C. Apte, F. Damerau, D. Johnson, F. Oles,
 T. Goetz and T. Hampp, Maximizing text-mining
 performance, IEEE Intelligent Systems 14 (1999) 63ñ69.
 [18] Strzalowski, T. (1995). Natural language information
 retrieval. Information Processing and Management, 31(3), 397–
 417.
  [19] Van Rijsbergen, C. (1977). A theoretical basis for use of
 co-occurrence data in information retrieval. Journal of
 Documentation, 33(2), 106–119.




                                                                                          305


Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.

More Related Content

PDF
UML Generator (NCC18)
PDF
Domain Specific Terminology Extraction (ICICT 2006)
PDF
UCD Generator (ICIET 2007)
PDF
NL based Object Oriented modeling - EJSR 35(1)
PDF
Web Layout Generation (IC-SCCE 2006)
PDF
Requirement Analysis - ijcee 2(3)
PDF
Interactive speech based games for autistic children with asperger syndrome
PDF
IRJET - A Review on Chatbot Design and Implementation Techniques
UML Generator (NCC18)
Domain Specific Terminology Extraction (ICICT 2006)
UCD Generator (ICIET 2007)
NL based Object Oriented modeling - EJSR 35(1)
Web Layout Generation (IC-SCCE 2006)
Requirement Analysis - ijcee 2(3)
Interactive speech based games for autistic children with asperger syndrome
IRJET - A Review on Chatbot Design and Implementation Techniques

What's hot (20)

PDF
Complete-Mini-Project-Report
PPT
Software Engineering Ontology
PDF
Financial Tracker using NLP
PDF
IRJET - E-Assistant: An Interactive Bot for Banking Sector using NLP Process
PDF
IRJET- Recruitment Chatbot
PDF
IRJET - Chatbot for HR Department using AIML and LSA
DOC
Resume_3+yearsexp_StorageTesting
PDF
Internal assessment marking system
DOCX
Curriculumn Viate Educationshakti
PDF
IRJET- Chatbot using NLP and Deep Learning
PDF
An Intelligent Career Counselling Bot A System for Counselling
PDF
Chat bot in_pythion
PDF
IRJET- An Intelligent Behaviour Shown by Chatbot System for Banking in Ve...
PDF
IRJET- Conversational Assistant based on Sentiment Analysis
PDF
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
PDF
User stories collection via interactive chatbot to support requirements gathe...
PDF
IRJET - Mobile Chatbot for Information Search
PDF
A prior case study of natural language processing on different domain
PPTX
Phone Book project in Data Structure C
Complete-Mini-Project-Report
Software Engineering Ontology
Financial Tracker using NLP
IRJET - E-Assistant: An Interactive Bot for Banking Sector using NLP Process
IRJET- Recruitment Chatbot
IRJET - Chatbot for HR Department using AIML and LSA
Resume_3+yearsexp_StorageTesting
Internal assessment marking system
Curriculumn Viate Educationshakti
IRJET- Chatbot using NLP and Deep Learning
An Intelligent Career Counselling Bot A System for Counselling
Chat bot in_pythion
IRJET- An Intelligent Behaviour Shown by Chatbot System for Banking in Ve...
IRJET- Conversational Assistant based on Sentiment Analysis
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
User stories collection via interactive chatbot to support requirements gathe...
IRJET - Mobile Chatbot for Information Search
A prior case study of natural language processing on different domain
Phone Book project in Data Structure C
Ad

Similar to Automated Java Code Generation (ICDIM 2006) (20)

PPSX
Software engineering
PDF
7 5-94-101
PDF
7 5-94-101
KEY
On the Relationship Between Change Coupling and Software Defects
PDF
3 f6 11_softdevmethodologies
PDF
Manual t(se)
PDF
Presentation - "A comparison of component-based software engineering and mode...
PDF
Quantify the Functional Requirements in Software System Engineering
DOC
Student copybca sem3-se
DOC
Manual Testing Notes
PDF
Applying a new software development paradigm to biology
PPT
Unit1
PPTX
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
PPT
Introduction and life cycle models
PDF
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
PDF
Sioux Hot-or-Not: Domain Driven Design (Edwin Van Dillen)
PDF
M256 Unit 1 - Software Development with Java
PDF
Domain Analysis & Data Modeling
PDF
Software Engineering The Multiview Approach And Wisdm
PPTX
Basics of se
Software engineering
7 5-94-101
7 5-94-101
On the Relationship Between Change Coupling and Software Defects
3 f6 11_softdevmethodologies
Manual t(se)
Presentation - "A comparison of component-based software engineering and mode...
Quantify the Functional Requirements in Software System Engineering
Student copybca sem3-se
Manual Testing Notes
Applying a new software development paradigm to biology
Unit1
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Introduction and life cycle models
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
Sioux Hot-or-Not: Domain Driven Design (Edwin Van Dillen)
M256 Unit 1 - Software Development with Java
Domain Analysis & Data Modeling
Software Engineering The Multiview Approach And Wisdm
Basics of se
Ad

More from IT Industry (14)

PPTX
The News Today 24 (https://thenewstoday24.com/)
PDF
Meaning Extraction - IJCTE 2(1)
PDF
NL Interface for Database - EJSR 20(4)
PDF
Virtual Telemedicine (IJITWE 5(1))
PDF
NL to OCL Transformation (EDOC 2010)
PDF
BPM & SOA for Small Business Enterprises (ICIME 2009)
PDF
Web Layout Mining - JECS 29(2)
PDF
Web User Forms (ICOMMS 2006)
PDF
Image Classification (icast 2006)
PDF
Reuse Software Components (IMS 2006)
PDF
GIS for Quetta (ICAST 2006)
PDF
NL Context Understanding 23(6)
PDF
PCA Clouds (ICET 2005)
PDF
Feature Based Image Classification by using Principal Component Analysis
The News Today 24 (https://thenewstoday24.com/)
Meaning Extraction - IJCTE 2(1)
NL Interface for Database - EJSR 20(4)
Virtual Telemedicine (IJITWE 5(1))
NL to OCL Transformation (EDOC 2010)
BPM & SOA for Small Business Enterprises (ICIME 2009)
Web Layout Mining - JECS 29(2)
Web User Forms (ICOMMS 2006)
Image Classification (icast 2006)
Reuse Software Components (IMS 2006)
GIS for Quetta (ICAST 2006)
NL Context Understanding 23(6)
PCA Clouds (ICET 2005)
Feature Based Image Classification by using Principal Component Analysis

Automated Java Code Generation (ICDIM 2006)

  • 1. Rule based Production Systems for Automatic Code Generation in Java Imran Sarwar Bajwa1, M. Imran Siddique2 and M. Abbas Choudhary3 Department of Computer Sciecne and Information Technology The Islamia University of Bahawalpur, Pakistan bajwa@buitms.edu.pk 2 Facuty of Computer an EmergingSciences 3 Balochistan University of Information Technology and Management Sciences Quetta, Pakistan. imran@buitms.edu.pk abbas@buitms.edu.pk Abstract 1. Introduction Unified modeling language is being used as a premier In the current age, the tools and techniques of software en- tool for modeling the user requirements. These CASE gineering has been changed to an adequate extent. Now tools provide an easy way to get efficient solutions. This every step of software engineering follows the rules of paper presents a natural language processing based object oriented design patterns. Same the case is with Soft- automated system for generating code in multi-languages ware process which uses Unified Modeling language for after modeling the user requirements based on UML. UML modeling the user requirements. In recent times, there is no diagrams are first generated by analyzing the given busi- software which provides services to draw UML diagrams ness scenario provided by the user. A new model is presen- more efficiently except Rational Rose, Smart Draw etc and ted for analyzing the natural languages and extracting the there is no doubt that these are reasonably good software relative and required information from the given re- but has many drawbacks. quirement notes by the user. User writes the requirements in simple English in a few paragraphs and the designed Conventionally, the system analyst has to do a lot of work system has conspicuous ability to analyze the given script. for deducing the business logic and understanding the user After compound analysis and extraction of associated requirements before drawing the UML diagrams by using information, the designed system draws various UML orthodox CASE tools. Hence, there is wastage of so much diagrams as activity diagrams, sequence diagrams, class time due to the dull nature of the available CASE tools for diagrams and Uses cases diagrams. The designed system the required scenario. In today’s world everybody needs a has robust ability to create code automatically without ex- quick and reliable service. So it was needed that there ternal environment. The designed system provides a quick should be some sort of intelligent software for generating and reliable way to generate UML diagrams and generate UML based documentation to save time and budget of both respective code to save the time and budget of both the the user and system analyst. user and system analyst. In order to resolve all such issues, we need software, which facilitates both users and software engineers. As far as this software is concerns the time, it takes to explore all the fa- cilities and services, should be quite less than a minute and this information is quite useful for the users. 1-4244-0682-X/06/$20.00 ©2006 IEEE 300 Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
  • 2. 2. Problem Description The problem specifically addressed in this research is flavours as English language has more than half dozen primarily related to the software analysis and design phase renowned flavours all over the world. These flavours have of the software development process. Few years ago data different accents, set of vocabularies and phonological as- flow diagram’s were being used to symbolize the flow of pects. These ominous and menacing discrepancies and in- data and represent the user’s requirements. But in current consistencies in natural languages make it a difficult task age, unified modeling language is used to model and map to process them as compared to the formal languages [13]. the user requirements, which is more comprehensive e and In the process of analyzing and understanding the natural authentic way to of representation and it is beneficial for languages, various problems are usually faced by the the later stages of software development. researchers. The problems connected to the greater After modelling and mapping the user requirements, next complexity of the natural language are verb’s conjugation, phase is to create the programming code in certain inflexion, lexical amplitude, problem of ambiguity, etc. computer language. Hand written code by a programmer is From this set of problems the problem which ever causes a conventional and orthodox solution of the problem which more difficulties is problem of ambiguity. Ambiguity could is time consuming. Modern software engineering requires be easily solved at the syntax and semantic level by using a quick and automated solutions which may have ability to sound and robust rule-based system. create more than the half code, so that the programmer may create the application after making appropriate adjustments 5. Object-Oriented Analysis and Design and alterations in the automated generated code with less Analysis and design of an information system relates to un- effort in less time as compared to the traditional derstand and intend the framework to accomplish the actual approaches. job. Typically, design is relates to manage and control the complexity parameter in a domain. A robust design 3. Problem’s Solution method also helps to split big tasks into controllable break- The conducted research provides a robust solution to the ups (Condamines, 2001). In software engineering, design addressed problem. Multi-lingual Code Generator (MCG) methods provide various notation usually graphical ones. provides the solution of the problem. The functionality of These notations allow to store and communicate the perpetu- the conducted research was domain specific but it can be al design decisions. Object-oriented design has overruled the enhanced easily in the future according to the require- typical analysis and design techniques as structured design ments. Current designed system incorporate the capability and data-driven design (Androutsopoulos, 1995). As com- of mapping user requirements after reading the given pared to old style design paradigms, object-oriented design models the every active entity of the problem domain using requirements in plain text and drawing the set of UML dia- concept of objects. Objects have: grams as Class Diagram, Activity Diagram, Sequence Diagram, Use case diagram and Component Diagram. State (shape and condition) Behaviour (What they perform) After drawing UML diagrams, designed system has pro- found ability to create code in various languages as Java, Object-oriented languages use variable to manifest the state Visual Basic, C # and C++. An Integrated Development of an object and methods or procedures to implement the Environment has also been provided for User Interaction behaviour of an object. For example, a ball could be an ob- and efficient Input and output. ject. There are different parameters of shape as colour, size, diameter, shape, type, etc. This object can also have beha- 4. Natural Language Processing viour as throw, roll, catch, hit, etc. The major task in analysis and design phase is to identify the valid objects and specify The understanding and multi-aspect processing of the nat- there states and behaviours. In conventional methods, system ural languages that are also termed as "speech languages", analyst performs this tough job and then maps this is actually one of the arguments of greater interest in the information into UML using some graphical tool as Visio or field artificial intelligence field [6]. The natural languages Rational Rose. are irregular and asymmetrical. Traditionally, natural languages are based on un-formal grammars. There are the In the context of this research, objects are automatically geographical, psychological and sociological factors which identified from a problem domain. User provides the input influence the behaviours of natural languages [17]. There text in English language related to the business domain. are undefined set of words and they also change and vary After the lexical analysis of the text, syntax analysis is area to area and time to time. Due to these variations and performed on word level to recognize the word category [4]. inconsistencies, the natural languages have differnet First of all the available lexicons are categorised i nto 301 Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
  • 3. nouns, pronouns, prepositions, adverbs, articles, conjunc- Co-actor: The word with may introduce a join phrase tions, etc. The syntactic analysis of the programs would that serves a partner in the principal actor. They two have to be in a position to isolate subject, verbs, objects, carry out the action together "Zahid played tennis with adverbs, adjectives and various other complements. It is Ali." little complex and multipart procedure. "Zia is playing with the red ball." Recipient: The recipient is the person for whom an action has bee performed: "Ali bought the balls for For this example, following is the output. Zahid." In this sentence Suzie is beneficiary. Lexicons Phase-I Phase –II Thematic object: The thematic object is the object the sentence is really all about. Often the thematic object is Zia Noun Object the same as the syntactic direct object, as "Zahid hit the is Helping-Verb ------- ball." Here the ball is thematic object. playing Verb Method with Preposition ------- Transportation: The transportation is something in the Article ------- which or on which travels: “Aslam always goes by red Noun Attribute train." ball Noun Object Path: Motion from source to destination takes place This is the final output of lexical assessment phase and over at path. Several prepositions can serve to introduce all nouns are marked as objects and verbs are marked as trajectory noun phrases: "Ali and Zahid went to methods and all adjective are marked as states of that par- Islamabad through Lahore." ticular object. In the above example, there are two objects ‘Zia’ and ball. ‘Playing’ is method of ‘Zia’ and ‘red’ is the Position: The position is where an action occurs. As in concerned attribute of the object ‘ball’. the path role, "several prepositions are possible, which conveys means in addition of serving as a signal that a position noun phrase is "Aslam and Ahmed studied in 6. Used Methodology the library, at a desk, by the wall, near the door." Conventional natural language processing based systems Period: Period specifies when an action occurs. user rule based systems. Agents are another way to address Prepositions such at, before, and after introduce noun this problem[8]. In the research, a rule-based algorithm has phrases serving as time role fill "Ahmed and Ali left been used which has robust ability to read, understand and before noon." extract the desired information. First of all basic elements of the language grammar are extracted as verbs, nouns, ad- Interval: Interval specifies how long an action takes. jectives, etc then on the basis of this extracted information Preposition such as for indicate duration. "Aslam and further processing is performed. In linguistic terms, verbs Ahmed jogged for an hour.” often specify actions, and noun phrases the objects that participate in the action [16]. Each noun phrase's then role 7. Architecture of Designed System specifies how the object participates in the action. As in the The designed UMLG system has ability to draw UML following example: diagrams after reading the text scenario provided by the "Robbie hit a ball with a racket." user. This system draws diagrams in five modules: Text input acquisition, text understanding, knowledge extrac- A procedure that understands such a sentence must dis- tion, generation of UML diagrams and finally multi-lingual cover that Role is the agent because he performs the action code generation as shown in following fig. of hitting, that the ball as the thematic object because it is the object hit, and that the racket is an instrument because Text it is the tool with which hitting is done. Thus, sentence analysis requires, in part, the answers to these actions: The Text Input Acquisition number of thematic roles embraced by various theories varies probably. Some people use about half-dozen thematic roles [9]. Others use more times as many. The ex- act number does not matter much, as long as they will great Text Understanding enough to expose natural constraints on how verbs and thematic-instances form sentences. Actor: The actor causes the action to occur as in "Zahid Knowledge Extraction hits the ball," Zahid is actor who performs the task. 302 Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
  • 4. UML Diagrams 7.4. UML diagram generation This module finally uses UML symbols to constitute the various UML diagrams by combining available symbols Multi-lingual Code Generation according to the information extracted of the previous module. As separate scenario will be provided for various Code diagrams as classes, sequence, activity and use cases dia- grams, so the separate functions are implemented for re- Figure 1: Architecture of the designed MCG system spective diagram. 7.1. Text input acquisition This module helps to acquire input text scenario. User provides the business scenario in from of paragraphs of the text. This module reads the input text in the form characters and generates the words by concatenating the input characters. This module is the implementation of the lexical phase. Lexicons and tokens are generated in this module. 7.2. Text Understanding This module reads the input from module 1 in the from of words. These words are categorized into various classes as verbs, helping verbs, nouns, pronouns, adjectives, preposi- tions, conjunctions, etc. 7.3. Knowledge extraction This module, extracts different objects and classes and their respective attributes on the basses of the input provided by the preceding module. Nouns are symbolized as classes and Figure 3: Generating class diagrams objects and their associated attributes are termed as attributes. 7.5. Multi-lingual Code generation This is the last module, which ultimately generate code in the different popular languages as Java, C#.Net and VB.Net to help the programmer. The generated code is structured according to the knowledge extracted in the pre- vious modules and the UML diagrams generated. Figure 2. Extracting classes, functions and attributes Figure 4: Generating Code in Java Language 303 Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
  • 5. 8. Accuracy Evaluation that he has followed the prerequisites of the software to prepare the input scenario. The given scenario should be To test the accuracy of the diagrams generated by the complete and written in simple and correct English. Under designed system four parameters had been decided. Each the scope of our project, software will perform a complete generated diagram from each category was checked. Max- analysis of the scenario to find the classes, their attributes imum score was declared 25. According to the wrong nom- and operations. It will also draw the diagrams as class dia- inations and extractions, the points were detected. A matrix grams, activity diagrams, use-case diagrams and sequence of results of generated diagrams is shown below. diagrams. Table 1 - Testing results of different UML Diagrams An elegant graphical user interface has also been provided to the user for entering the Input scenario in a proper way and generating UML diagrams. Dig. Types Objects Attributes Sequence labeling Total Class 22 24 20 19 85% 10.Future Work Activity 23 21 16 20 80% The designed system for generating UML diagrams and their respective code in multiple languages was started Sequence 21 22 13 18 74% with the aims that there should be a software which can Use case 21 24 21 22 88% read the scenario given in English language and can draw the all types of the UML diagrams such as Class diagram, activity diagram, sequence diagram, use case diagram, A matrix representing UML diagrams accuracy test (%) component diagram, deployment diagram. But last two of for class, activity, sequence and use case diagrams has them component diagram, deployment diagram are still un- been constructed. Overall diagrams accuracy for all types touched. of UML diagrams is determined by adding total accuracy of all categories and calculating average of it. Following There is also some margin of improvements in the graph is showing the accuracy ratio of various diagram algorithms for generating first four types Class diagram, types in terms of objects, attributes, sequence and labeling activity diagram, sequence diagram, use case diagram. parameters. Current accuracy of generating diagrams is about 80% to 85%. It can be enhanced up to 95% by improving the algorithms and inducing the ability of learning in the sys- tem. 11.References [1] Allen,J. (1994) Natural Language Understanding. Benjamin- Cummings Publishing Company, New York. [2] Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge Univ. Press, Cambridge, U.K. [3] Blaschke,C., Andrade,M.A., Ouzounis,C. and Valencia,A. (1999) Automatic extraction of biological Figure 5: Accuracy Ratio of various diagram types information from scientific text: protein–protein interactions. Ismb, 60–67. 9. Conclusion [4] C. A. Thompson, R. J. Mooney and L. R. Tang, Learning to parse natural language database queries into This research is all about the dynamic generation of the logical form, in: Workshop on Automata Induction, UML diagrams and their respective code by reading and Grammatical Inference and Language Acquisition (1997). analyzing the given scenario in English language provided by the user. The designed system can find out the classes and [5] Chomsky, N. (1959). On certain formal properties of objects and their attributes and operations using an artificial grammars. Information and Control, 2(2), 137–167. intelligence technique such as natural language processing. [6] Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Then the UML diagrams such as Activity dig., Sequence Press, Cambridge, Mass. Chow, C., & Liu, C. (1968). dig., Component dig., Use Case dig., etc would be drawn. Approximating discrete probability distributions with The accuracy of the software is expected up to about 80% dependence trees. IEEE Transactions on Information Theory, with the involvement of the software engineer provided IT-14(3), 462–467. 304 Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.
  • 6. [7] Fagan, J. L. (1989). The effectiveness of a non-syntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40(2), 115–132. [8] J. M. Zelle and R. J. Mooney, Learning semantic grammars with constructive inductive logic programming, in: Proceedings of the 11th National Conference on Arti_cial Intelligence (AAAI Press/MIT Press, Washington, D.C., 1993), pp. 817ñ822. [9] Kowalski, G. (1998). Information Retrieval Systems: Theory and Implementation. Kluwer, Boston. [10] Krovetz, R., & Croft, W. B. (1992). Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10, 115–141. [11] Losee, R. M. (1988). Parameter estimation for probabilistic document retrieval models. Journal of the American Society for Information Science, 39(1), 8–16. [12] Losee, R. M. (1996a). Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering: An empirical basis for grammatical rules. Information Processing and Management, 32(2), 185–197. [13] Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Mass. [14] Maron, M. E.,& Kuhns, J. L. (1960). On relevance, probabilistic indexing, and information retrieval. Journal of the ACM, 7, 216–244. [15] Partee, B. H., Meulen, A. t., &Wall, R. E. (1990). Mathematical Methods in Linguistics. Kluwer, Dordrecht, The Netherlands. [16] Salton, G., & McGill, M. (1983). Introduction to Modern Information Retrieval. McGraw-Hill, New York. [17] S. Weiss, C. Apte, F. Damerau, D. Johnson, F. Oles, T. Goetz and T. Hampp, Maximizing text-mining performance, IEEE Intelligent Systems 14 (1999) 63ñ69. [18] Strzalowski, T. (1995). Natural language information retrieval. Information Processing and Management, 31(3), 397– 417. [19] Van Rijsbergen, C. (1977). A theoretical basis for use of co-occurrence data in information retrieval. Journal of Documentation, 33(2), 106–119. 305 Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on July 27, 2009 at 03:53 from IEEE Xplore. Restrictions apply.