SlideShare a Scribd company logo
Communication And Power Engineering R Rajesh
Editor B Mathivanan Editor download
https://ebookbell.com/product/communication-and-power-
engineering-r-rajesh-editor-b-mathivanan-editor-51110820
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Mobile Communication And Power Engineering Second International Joint
Conference Aimccpe 2012 Bangalore India April 2728 2012 Revised
Selected Papers 1st Edition Thiruppathy Kesavan V
https://ebookbell.com/product/mobile-communication-and-power-
engineering-second-international-joint-conference-
aimccpe-2012-bangalore-india-april-2728-2012-revised-selected-
papers-1st-edition-thiruppathy-kesavan-v-4522856
Power Line Communication Systems For Smart Grids Ivan Rs Casella
https://ebookbell.com/product/power-line-communication-systems-for-
smart-grids-ivan-rs-casella-22035188
Caste Communication And Power Biswajit Das Debendra Prasad Majhi
https://ebookbell.com/product/caste-communication-and-power-biswajit-
das-debendra-prasad-majhi-33791514
Mutative Media Communication Technologies And Power Relations In The
Past Present And Futures 1st Edition James A Dator
https://ebookbell.com/product/mutative-media-communication-
technologies-and-power-relations-in-the-past-present-and-futures-1st-
edition-james-a-dator-4931380
The International Political Economy Of Communication Media And Power
In South America Cheryl Martens
https://ebookbell.com/product/the-international-political-economy-of-
communication-media-and-power-in-south-america-cheryl-martens-5380564
Media Power And Empowerment Central And Eastern European Communication
And Media Conference Ceecom Prague 2012 1st Edition Tereza Pavlickova
https://ebookbell.com/product/media-power-and-empowerment-central-and-
eastern-european-communication-and-media-conference-ceecom-
prague-2012-1st-edition-tereza-pavlickova-5767294
Paul And The Dynamics Of Power Communication And Interaction In The
Early Christmovement Kathy Ehrensperger
https://ebookbell.com/product/paul-and-the-dynamics-of-power-
communication-and-interaction-in-the-early-christmovement-kathy-
ehrensperger-50679474
Paul And The Dynamics Of Power Communication And Interaction In The
Early Christmovement Library Of New Testament Studies Kathy
Ehrensperger
https://ebookbell.com/product/paul-and-the-dynamics-of-power-
communication-and-interaction-in-the-early-christmovement-library-of-
new-testament-studies-kathy-ehrensperger-2493112
Strategic Narratives Communication Power And The New World Order
Alister Miskimmon
https://ebookbell.com/product/strategic-narratives-communication-
power-and-the-new-world-order-alister-miskimmon-7120144
Communication And Power Engineering R Rajesh Editor B Mathivanan Editor
R. Rajesh, B. Mathivanan (Eds.)
Communication and Power Engineering
Communication And Power Engineering R Rajesh Editor B Mathivanan Editor
Communication
and Power
Engineering
Edited by
R. Rajesh, B. Mathivanan
Editors
Dr. R. Rajesh
Central University of Kerala
India
kollamrajeshr@gmail.com
Dr. B. Mathivanan
Sri Ramakrishna Engg. College
India
mathivanan.bala@srec.ac.in
ISBN 978-3-11-046860-1
e-ISBN (PDF) 978-3-11-046960-8
Set-ISBN 978-3-11-046961-5
Library of Congress Cataloging-in-Publication Data
A CIP catalog record for this book has been applied for at the Library of Congress.
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available on the Internet at http://dnb.dnb.de.
© 2016 Walter de Gruyter GmbH, Berlin/Boston
Printing and binding: CPI books GmbH, Leck
cover image: Thinkstock/tStockbyte
♾ Printed on acid-free paper
Printed in Germany
www.degruyter.com
Committees
Honorary Chair
Dr. Shuvra Das (University of Detroit Mercy, USA)
Dr. Jiguo Yu (Qufu Normal University, China)
Technical Chair
Dr. Sumeet Dua (Louisiana Tech University, USA)
Dr. Amit Banerjee (The Pennsylvania State University, USA)
Dr. Narayan C Debnath (Winona State University, USA)
Dr. Xiaodi Li (Shandong Normal University, China)
Technical Co-Chair
Dr. Natarajan Meghanathan (Jackson State University, USA)
Dr. Hicham Elzabadani (American University in Dubai)
Dr. Shahrokh Valaee (University of Toronto, Canada)
Chief Editors
Dr. Rajesh R (Central University of Kerala, India)
Dr. B Mathivanan (Sri Ramakrishna Engg. College, India)
General Chair
Dr. Janahanlal Stephen (Matha College of Technology, India)
Dr. Yogesh Chaba (Guru Jambeswara University, India)
General Co-Chair
Prof. K. U Abraham (Holykings College of Engineering, India)
Publicity Chair
Dr. Amit Manocha (Maharaja Agrasen Institute of Technology, India)
Finanace Chair
Dr. Gylson Thomas (Jyothi Engineering College, India)
Dr. Ilias Maglogiannis (University of Central Greece)
Publicity Co-Chair
Prof. Ford Lumban Gaol (University of Indonesia)
Dr. Amlan Chakrabarti (University of Culcutta, India)
Prof. Prafulla Kumar Behera, PhD(Utkal University, India)
vi | Committees
Publication Chair
Dr. Vijayakumar (NSS Engg. College, India)
Dr. T.S.B.Sudarshan (BITS Pilani, India)
Dr. KP Soman (Amritha University, India)
Prof. N.Jaisankar (VIT University, India)
Dr. Rajiv Pandey (Amity University, India)
Program Committee Chair
Dr. Harry E. Ruda (University of Toronto, Canada)
Dr Deepak Laxmi Narasimha (University of Malaya, Malaysia)
Dr.N.Nagarajan (Anna University, Coimbatore, India)
Prof. Akash Rajak (Krishna Institute of Engg. & Tech., UP, India)
Prof. M Ayoub Khan (CDAC, NOIDA, India)
Programming Committee
Prof. Shelly Sachdeva (Jaypee Institute of Information & Technology
University, India)
Prof. PRADHEEP KUMAR K (SEEE, India)
Mrs. Rupa Ashutosh Fadnavis (Yeshwantrao Chavan College
of Engineering, India)
Dr. Shu-Ching Chen (Florida International University, USA)
Dr. Stefan Wagner (Fakultät für Informatik Technische Universität
München, Boltzmannstr)
Prof. Juha Puustjärvi (Helsinki University of Technology)
Dr. Selwyn Piramuthu (University of Florida)
Dr. Werner Retschitzegger (University of Linz, Austria)
Dr. Habibollah Haro (Universiti Teknologi Malaysia)
Dr. Derek Molloy (Dublin City University, Ireland)
Dr. Anirban Mukhopadhyay (University of Kalyani, India)
Dr. Malabika Basu (Dublin Institute of Technology, Ireland)
Dr. Tahseen Al-Doori (American University in Dubai)
Dr. V. K. Bhat (SMVD University, India)
Dr. Ranjit Abraham (Armia Systems, India)
Dr. Naomie Salim (Universiti Teknologi Malaysia)
Dr. Abdullah Ibrahim (Universiti Malaysia Pahang)
Dr. Charles McCorkell (Dublin City University, Ireland)
Dr. Neeraj Nehra (SMVD University, India)
Committees | vii
Dr. Muhammad Nubli (Universiti Malaysia Pahang)
Dr. Zhenyu Y Angz (Florida International University, USA)
Dr. Keivan Navi (Shahid Beheshti University,
Communication And Power Engineering R Rajesh Editor B Mathivanan Editor
Preface
It is my proud privilege to welcome you all to the joint International Conferences
organized by IDES. This conference is jointly organized by the IDES and the As-
sociation of Computer Electrical Electronics and Communication Engineers
(ACEECom). The primary objective of this conference is to promote research and
developmental activities in Computer Science, Electrical, Electronics, Network,
Computational Engineering, and Communication. Another objective is to pro-
mote scientific information interchange between researchers, developers, engi-
neers, students, and practitioners working in India and abroad.
I am very excited to see the research papers from various parts of the world.
This proceeding brings out the various Research Papers from diverse areas of
Computer Science, Electrical, Electronics, Network, Computational Engineering,
and Communication. This conference is intended to provide a common platform
for Researchers, Academicians and Professionals to present their ideas and inno-
vative practices and to explore future trends and applications in the field of Sci-
ence and Engineering. This conference also provides a forum for dissemination
of Experts’ domain knowledge. The papers included in the proceedings are peer-
reviewed scientific and practitioners’ papers, reflecting the variety of Advances
in Communication, Network, Electrical, Electronics, and Computing.
As a Chief Editor of this joint Conference proceeding, I would like to thank all
of the presenters who made this conference so interesting and enjoyable. Special
thanks should also be extended to the session chairs and the reviewers who gave
of their time to evaluate the record number of submissions. To all of the members
of various Committees, I owe a great debt as this conference would not have not
have been possible without their constant efforts. We hope that all of you reading
enjoy these selections as much as we enjoyed the conference.
Dr. B Mathivanan
Sri Ramakrishna Engineering College, India
Communication And Power Engineering R Rajesh Editor B Mathivanan Editor
Table of Contents
Foreword | xv
Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri
Handwritten Script Identification from Text Lines | 1
Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
A Rule based Approach for Noun Phrase Extraction from English Text
Document | 13
Jaya Gera and Harmeet Kaur
Recommending Investors using Association Rule Mining for Crowd Funding
Projects | 27
P. S. Hiremath and Rohini A. Bhusnurmath
Colour Texture Classification Using Anisotropic Diffusion and Wavelet
Transform | 44
I.Thamarai and S. Murugavalli
Competitive Advantage of using Differential Evolution Algorithm for Software
Effort Estimation | 62
Shilpa Gopal and Dr. Padmavathi.S
Comparative Analysis of Cepstral analysis and Autocorrelation Method for
Gender Classification | 76
P Ravinder Kuma, Dr Sandeep.V.M and Dr Subhash S Kulkarni
A Simulative Study on Effects of Sensing Parameters on Cognitive Radio’s
Performance | 90
Priyanka Parida, Tejaswini P. Deshmukh, and Prashant Deshmukh
Analysis of Cyclotomic Fast Fourier Transform by Gate level Delay
Method | 104
Liji P I and Bose S
Dynamic Resource Allocation in Next
Generation Networks using FARIMA Time
Series Model | 112
xii | Table of Contents
Ms Shanti Swamy, Dr.S.M.Asutkar and Dr.G.M.Asutkar
Classification of Mimetite Spectral Signatures using Orthogonal Subspace
Projection with Complex Wavelet Filter Bank based Dimensionality
Reduction | 126
Sharmila Kumari M, Swathi Salian and Sunil Kumar B. L
An Illumination Invariant Face Recognition Approach based on Fourier
Spectrum | 132
Arlene Davidson R and S. Ushakumari
Optimal Load Frequency Controller for a Deregulated Reheat Thermal Power
System | 144
Chandana B R and A M Khan
Design and Implementation of a Heuristic Approximation Algorithm for
Multicast Routing in Optical Networks | 155
Sneha Sharma and P Beaulah Soundarabai
Infrastructure Management Services Toolkit | 167
Divyansh Goel, Agam Agarwal and Rohit Rastogi
A Novel Approach for Residential Society Maintenance Problem for Better
Human Life | 177
H. Kavitha, Montu Singh, Samrat Kumar Rai and Shakeelpatel Biradar
Smart Suspect Vehicle Surveillance System | 186
Takahito Kimura and Shin-Ya Nishizaki
Formal Performance Analysis of Web Servers using an SMT Solver and a Web
Framework| 195
Jisha P Abraham and Dr.Sheena Mathew
Modified GCC Compiler Pass for Thread-Level Speculation by Modifying the
Window Size using Openmp | 205
Liu and Baiocchi
Overview and Evaluation of an IoT Product for Application Development | 213
Table of Contents | xiii
A.Senthamaraiselvan and Ka.Selvaradjou
A TCP in CR-MANET with Unstable Bandwidth | 224
Morande Swapnil and Tewari Veena
Impact of Digital Ecosystem on Business Environment | 233
Narayan Murthy
A Two-Factor Single Use Password Scheme | 242
Dr.Ramesh k
Design & Implementation of Wireless System for Cochlear Devices | 248
Gurunadha Rao Goda and Dr. Avula Damodaram
Software Code Clone Detection and Removal using Program Dependence
Graphs | 256
Dileep Kumar G., Dr. Vuda Sreenivasa Rao, Getinet Yilma and
Mohammed Kemal Ahmed
Social Sentimental Analytics using Big Data Tools | 266
J. Prakash and A. Bharathi
Predicting Flight Delay using ANN with Multi-core Map Reduce
Framework| 280
Dr.Ramesh K, Dr. Sanjeevkumar K.M and Sheetalrani Kawale
New Network Overlay Solution for Complete Networking Virtualization | 288
Konda.Hari Krishna, Dr.Tapus Kumar, Dr.Y.Suresh Babu, N.Sainath and
R.Madana Mohana
Review upon Distributed Facts Hard Drive Schemes throughout Wireless Sensor
Communities | 297
Mohd Maroof Siddiqui, Dr. Geetika Srivastava, Prof (Dr) Syed Hasan Saeed and
Shaguftah
Detection of Rapid Eye Movement Behaviour Sleep Disorder using Time and
Frequency Analysis of EEG Signal Applied on C4-A1 Channel | 310
xiv | Table of Contents
Komal Sunil Deokar and Rajesh Holmukhe
Analysis of PV/ WIND/ FUEL CELL Hybrid System Interconnected With Electrical
Utility Grid | 327
Lipika Nanda and Pratap Bhanu Mishra
Analysis of Wind Speed Prediction Technique by hybrid Weibull-ANN
Model | 337
K.Navatha, Dr. J.Tarun Kumar and Pratik Ganguly
An efficient FPGA Implementation of DES and Triple-DES Encryption
Systems | 348
Sunil Kumar Jilledi and Shalini J
A Novelty Comparison of Power with Assorted Parameters of a Horizontal Wind
Axis Turbine for NACA 5512 | 357
Naghma Khatoon and Amritanjali
Retaliation based Enhanced Weighted Clustering Algorithm for Mobile Ad-hoc
Network (R-EWCA) | 365
Dr.K.Meenakshi Sundaram and Sufola Das Chagas Silva Araujo
Chest CT Scans Screening of COPD based Fuzzy Rule Classifier
Approach | 373
Author Index | 385
Foreword
The Institute of Doctors Engineers and Scientists (IDES) (with an objective to
promote the Research and Development activities in the Science, Medical, Engi-
neering and Management field) and the Association of Computer Electrical Elec-
tronics and Communication (ACEECom) (with an objective to disseminate
knowledge and to promote the research and development activities in the
engineering and technology field) has both joined hands to hand together for
the benefit of the society. For more than a decade, both IDES and ACEECom are
well established in organizing conferences and publishing journals.
This joint International Conference organized by IDES and ACEECom in 2016,
aiming to bring together the Professors, Researchers, and Students in all areas
of Computer Science, Information Technology, Computational Engineering,
Communication, Signal Processing, Power Electronics, Image Processing, etc. in
one platform, where they can interact and share the ideas.
A total of 35 eminent scholars/speakers have registered their papers in areas
of Computer Science and Electrical & Electronics discipline. These papers are
published in a proceedings by De Gruyter Digital Library and are definitely go-
ing to be the eye-opening to the world for further research in this area.
The organizations (IDES and ACEECom) will again come together in front of
you in future for further exposure of the unending research.
Dr. Rajesh R
Central University of Kerala, India
Communication And Power Engineering R Rajesh Editor B Mathivanan Editor
Pawan Kumar Singh1
, Iman Chatterjee2
, Ram Sarkar3
and
Mita Nasipuri4
Handwritten Script Identification from Text
Lines
Abstract: In a multilingual country like India where 12 different official scripts
are in use, automatic identification of handwritten script facilitates many im-
portant applications such as automatic transcription of multilingual docu-
ments, searching for documents on the web/digital archives containing a par-
ticular script and for the selection of script specific Optical Character
Recognition (OCR) system in a multilingual environment. In this paper, we pro-
pose a robust method towards identifying scripts from the handwritten docu-
ments at text line-level. The recognition is based upon features extracted using
Chain Code Histogram (CCH) and Discrete Fourier Transform (DFT). The pro-
posed method is experimented on 800 handwritten text lines written in seven
Indic scripts namely, Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Urdu
along with Roman script and yielded an average identification rate of 95.14%
using Support Vector Machine (SVM) classifier.
Keywords: Script Identification, Handwritten text lines, Indic scripts, Chain
Code Histogram, Discrete Fourier Transform, Multiple Classifiers
1 Introduction
One of the major Document Image Analysis research thrusts is the implementa-
tion of OCR algorithms that are able to make the alphanumeric characters pre-
sent in a digitized document into a machine readable form. Examples of the
applications of such research include automated word recognition, bank check
||
1 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
pawansingh.ju@gmail.com
2 Department of Computer Science and Engineering, Netaji Subhash Engineering College,
Kolkata, India
imanchatterjee9@gmail.com
3 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
raamsarkar@gmail.com
4 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
mitanasipuri@gmail.com
2 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri
processing, and address sorting in postal applications etc. Consequently, the
vast majority of the OCR algorithms used in these applications are selected
based upon a priori knowledge of the script and/or language of the document
under analysis. This assumption requires human intervention to select the ap-
propriate OCR algorithm, limiting the possibility of completely automating the
analysis process, especially when the environment is purely multilingual. In
this scenario, it is very necessary to have the script recognition module before
applying such document into appropriate OCR system.
In general, script identification can be achieved at any of the three levels:
(a) Page-level, (b) Text-line level and (c) Word-level. In comparison to page or
word-level, script recognition at the text line-level in a multi-script document
may be much more challenging but it has its own advantages. To reliably identi-
fy the script type, one needs a certain amount of textual data. But identifying
text words of different scripts with only a few numbers of characters may not
always be feasible because at word-level, the number of characters present in a
single word may not be always informative. In addition, performing script iden-
tification at word-level also requires the exact segmentation of text words which
is again an exigent task. On the contrary, identifying scripts at page-level can be
sometimes too convoluted and protracted. So, it would be better to perform the
script identification at text line-level than its two counterparts.
A detailed state-of-the-art on Indic script identification described by P. K.
Singh et al. [1] shows that most of the reported studies [2-8], accomplishing
script identification at text line-level, work for printed text documents. G. D.
Joshi et al. [2] proposed a hierarchical script classifier which uses a two-level,
tree based scheme for identifying 10 printed Indic scripts namely, Bangla, Deva-
nagari, Gujarati, Gurumukhi, Kannada, Malayalam, Oriya, Tamil and Urdu in-
cluding Roman script. A total of 3 feature set such as, statistical, local, horizon-
tal profile are extracted from the normalized energy of log-Gabor filters
designed at 8 equi-spaced orientations (0 , 22.5 , 45 , 77.5 , 90 , 112.5 ,
135.5 and 180 ) and at an empirically determined optimal scale. An overall
classification accuracy of 97.11% is obtained. M. C. Padma et al. [3] proposed to
develop a model based on top and bottom profile based features to identify and
separate text lines of Telugu, Devnagari and English scripts from a printed tri-
lingual document. A set of eight features (i.e. bottom max-row, top-horizontal-
line, tick-component, bottom component (extracted from the bottom-portion of
the input text line), top-pipe-size, bottom-pipe-size, top-pipe-density, bottom-
pipe-density) are experimentally computed and the overall accuracy of the sys-
tem is found to be 99.67%. M. C. Padma et al. [4] also proposed a model to iden-
tify the script type of a trilingual document printed in Kannada, Hindi and Eng-
Handwritten Script Identification from Text Lines | 3
lish scripts. The distinct characteristic features of said scripts are thoroughly
studied from the nature of the top and bottom profiles. A set of 4 features name-
ly, profile_value (computed as the density of the pixels present at top_max_row
and bottom_max_row), bottom_max_row_no (the value of the attribute bot-
tom_max_row), coeff_profile, top_component_density (the density of the con-
nected components at the top_max_row) are computed. Finally, k-NN (k-Nearest
Neighbor) classifier is used to classify the test samples with an average recogni-
tion rate of 99.5%. R. Gopakumar et al. [5] described a zone-based structural
feature extraction algorithm for the recognition of South-Indic scripts (Kannada,
Telugu, Tamil and Malayalam) along with English and Hindi. A set of 9 features
such as number of horizontal lines, vertical lines, right diagonals, left diago-
nals, normalized lengths of horizontal lines, vertical lines, right diagonals, left
diagonals and normalized area of the line image are computed for each text line
image. Finally, the classification accuracies of 100% and 98.3% are achieved
using k-NN and SVM (Support Vector Machine) respectively. M. Jindal et al. [6]
proposed a script identification approach for Indic scripts at text line-level
based upon features extracted using Discrete Cosine Transform (DCT) and Prin-
cipal Component Analysis (PCA) algorithm. The proposed method is tested on
printed document images in 11 major Indian languages (viz., Bangla, Hindi,
Gujarati, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu, English and Urdu)
and 95% recognition accuracy is obtained. R. Rani et al. [7] presented the effec-
tiveness of Gabor filter banks using k-NN, SVM and PNN (Probabilistic Neural
Network) classifiers to identify the scripts at text-line level from trilingual doc-
uments printed in Gurumukhi, Hindi and English. The experiment shows that a
set of 140 features based on Gabor filter with SVM classifier achieve the maxi-
mum recognition rate of 99.85%. I. Kaur et al. [8] presented a script identifica-
tion work for the identification of English and Punjabi scripts at text-line level
through headline and characters density features. The approach is thoroughly
tested for different font size images and an average accuracy of 90.75% is
achieved. On the contrary, researches made on handwritten documents are only
a few in number. M. Hangarge et al. [9] investigated texture pattern as a tool for
determining the script of handwritten document image, based on the observa-
tion that text has a distinct visual texture. A set of 13 spatial spread features of
the three Indic scripts namely, English, Devanagari and Urdu are extracted using
morphological filters and the overall accuracies of the proposed algorithm are
found to be 88.67% and 99.2% for tri-script and bi-script classifications respec-
tively using k-NN classifier. P. K. Singh et al. [10] proposed a texture based ap-
proach for text line-level script identification of six handwritten scripts namely,
Bangla, Devanagari, Malayalam, Tamil, Telugu and Roman. A set of 80 features
4 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri
based on Gray Level Co-occurrence Matrix (GLCM) is used and an overall recog-
nition rate of 95.67% is achieved using Multi Layer Perceptron (MLP) classifier.
To the best of our knowledge, script identification at text line-level considering
large number of Indic handwritten scripts does not exist in the literature. In this
paper, we propose a text line-level script identification technique written in
seven popular official Indic scripts namely, Gujarati, Kannada, Malayalam, Ori-
ya, Tamil, Telugu, Urdu along with Roman script.
2 Data Collection and Preprocessing
At present, no standard database of handwritten Indic scripts are available in
public domain. Hence, we created our own database of handwritten documents
in the laboratory. The document pages for the database are collected by differ-
ent persons on request under our supervision. The writers are asked to write
inside A-4 size pages, without imposing any constraint regarding the content of
the textual materials. The document pages are digitized at 300 dpi resolution
and stored as gray tone images. The scanned images may contain noisy pixels
which are removed by applying Gaussian filter [11]. It should be noted that the
handwritten text line (actually, portion of the line arbitrarily chosen) may con-
tain two or more words with noticeable intra- and inter-word spacings. Numer-
als that may appear in the text are not considered for the present work. It is
ensured that at least 50% of the cropped text line contains text. A sample snap-
shot of text line images written in eight different scripts is shown in Fig. 1.
Otsu’s global thresholding approach [12] is used to convert them into two-tone
images. However, the dots and punctuation marks appearing in the text lines
are not eliminated, since these may also contribute to the features of respective
scripts. Finally, a total of 800 handwritten text line images are considered, with
exactly100 text lines per script.
3 Feature Extraction
The feature extraction is based on the combination of Chain Code Histogram
(CCH) and Discrete Fourier Transform (DFT) which are described in detail in the
next subsection.
Handwritten Script Identification from Text Lines | 5
Figure 1. Sample text line images taken from our database written in: (a) Gujarati, (b) Kannada,
(c) Malayalam, (d) Oriya, (e) Tamil, (f) Telugu, (g) Urdu, and (h) Roman scripts respectively
3.1 Chain Code Histogram
Chain codes [11] are used to represent a boundary by a connected sequence of
straight-line segments of specified length and direction. It describes the move-
ment along a digital curve or a sequence based on the connectivity. Two types of
chain codes are possible which are based on the numbers of neighbors of a
pixel, namely, four or eight, giving rise to 4- or 8-neighbourhood. The corre-
sponding codes are the 4-directional code and 8-directional code, respectively.
The direction of each segment is coded by using a numbering scheme as shown
in Fig. 2. In the present work, the boundaries of handwritten text lines written in
different scripts can be traced and allotted the respective numbers based on the
directions. Thus, the boundary of each of the text line is reduced to a sequence
of numbers. A boundary code formed as a sequence of such directional numbers
is referred to as a Freeman chain code.
6 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri
Figure 2. Illustration of numbering the directions for: (a) 4-dimensional, and (b) 8-dimensional
chain codes
The histogram of Freeman chain codes are taken as feature values F1-F8 and
the histogram of first difference of the chain codes are also taken as feature
values F9-F15. Let us denote the set of pixels by R. The perimeter of a region R is
the number of pixels present in the boundary of R. In a binary image, the perim-
eter is the number of foreground pixels that touches the background in the im-
age. For an 8-directional code, the length of perimeter of each text line (F16) is
calculated as: |P| = Even count + √2 *(Odd count). A circularity measure (F17)
proposed by Haralick [13] can be written as:
= (1)
where, and are the mean and standard deviation of the distance from the
centroid of the shape to the shape boundary and can be computed as follows:
=
1
‖( , ) − ( , )‖ (2)
=
1
‖( , ) − ( , )‖ − (3)
where, the set of pixels ( , ), = 0, … . . , − 1 lie on the perimeter P of the
region. The circularity measure increases monotonically as the digital shape
becomes more circular and is similar for digital and continuous shapes. Along
the circularity, the slopes are labeled in accordance with their chain codes
which are shown in Table 1.
Handwritten Script Identification from Text Lines | 7
Table 1. Labeling of slope angles according to their chain codes
Chain code 0 1 2 3 4 5 6 7
θ 0 450
900
1350
1800
-1350
-900
-450
The count of the slopes having θ values 00
, |450
|, |900
|, |1350
|, 1800
for each of
the handwritten text line images are taken as feature values (F18-F22).
3.2 Discrete Fourier Transform
The Fourier Transform [11] is an important image processing tool which is used
to decompose an image into its sine and cosine components. The output of the
transformation represents the image in the Fourier or frequency domain, while
the input image is the spatial domain equivalent. In the Fourier domain, each
point in the spatial domain image represents a particular frequency.
The Discrete Fourier Transform (DFT) is the sampled Fourier Transform and
therefore does not contain all frequencies forming an image, but only a set of
samples which is large enough to fully describe the spatial domain image. The
number of frequencies corresponds to the number of pixels in the spatial do-
main image, i.e., the images in the spatial and Fourier domains are of the same
size. The DFT of a digital image of size can be written as:
( , ) =
1
( , ) (4)
where, ( , ) is the image in the spatial domain and the exponential term
is the basis function corresponding to each point ( , ) in the Fourier space.
The value of each point ( , ) is obtained by multiplying the spatial image
with the corresponding base function and summing the result. The Fourier
Transform produces a complex number valued output which can be displayed
with two images, either with the real and imaginary parts or with the magnitude
and phase, where magnitude determines the contribution of each component
and phase determines which components are present. The plots for magnitude
and phase components for a sample Tamil handwritten text-line image are
shown in Fig. 3. In the current work, only the magnitude part of DFT is em-
ployed as it contains most of the information of the geometric structure of the
spatial domain image. This in turn becomes easy to examine or process certain
frequencies of the image. The magnitude coefficient is normalized as follows:
8 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri
( , ) =
| ( , )|
∑ | ( , )|
,
(5)
The algorithm for feature extraction using DFT is as follows:
Step 1: Divide the input text line image into nxn non-overlapping blocks
which are known as grids. The optimal value of has been chosen as 4.
Step 2: Compute the DFT (by applying Eqn. (4)) in each of the grids.
Step 3: Estimate only the magnitude part of the DFT and normalize it using
Eqn. (5).
Step 4: Calculate the mean and standard deviation of the magnitude part
from each of the grids which give a feature vector of 32 elements (F23-F54).
Figure 3. Illustration of: (a) handwritten Tamil text-line image, (b) its magnitude component,
and (c) its phase component after applying DFT
4 Experimental Results and Discussion
The performance of the present script identification scheme is evaluated on a
dataset of 800 preprocessed text line images as described in Section 2. For each
dataset of 100 text line images of a particular script, 65 images are used for
training and the remaining 45 images are used for testing purpose. The pro-
posed approach is evaluated by using seven well-known classifiers namely,
Naïve Bayes, Bayes Net, MLP, SVM, Random Forest, Bagging and MultiClass
Classifier. The recognition performances and their corresponding scores
achieved at 95% confidence level are shown in Table 2.
Handwritten Script Identification from Text Lines | 9
Table 2. Recognition performances of the proposed script identification technique using seven
well-known classifiers (best case is shaded in grey and styled in bold)
Classifiers
Naïve Bayes Bayes Net MLP SVM Random
Forest
Bagging MultiClass
Classifier
Success Rate (%) 89.33 90.09 95.14 97.03 94.6 91.25 92.74
95% confidence
score (%)
91.62 93.27 96.85 99.7 97.39 93.54 95.52
As observed from Table 2 that SVM classifier produces the highest identifi-
cation accuracy of 97.03%. In the present work, detailed error analysis of SVM
classifier with respect to different well-known parameters namely, Kappa statis-
tics, mean absolute error, root mean square error, True Positive rate (TPR), False
Positive rate (FPR), precision, recall, F-measure, Matthews Correlation Coeffi-
cient (MCC) and Area Under ROC (AUC) are also computed. The values of Kappa
statistics, mean absolute error, root mean square error of SVM classifier for the
present technique are found to be 0.9661, 0.0074 and 0.0862 respectively. Table
3 provides a statistical performance analysis of the remaining parameters for
each of the aforementioned scripts.
Table 3. Statistical performance measures along with their respective means (shaded in grey
and styled in bold) achieved by the proposed technique for eight handwritten scripts
Scripts TP rate FP rate Precision Recall F-measure MCC AUC
Gujarati 1.000 0.000 1.000 1.000 1.000 1.000 1.000
Kannada 0.970 0.025 0.845 0.970 0.903 0.891 0.972
Malayalam 0.950 0.000 1.000 0.950 0.975 0.972 0.975
Oriya 1.000 0.000 1.000 1.000 1.000 1.000 1.000
Tamil 0.990 0.000 1.000 0.990 0.995 0.994 0.995
Telugu 0.980 0.000 1.000 0.980 0.990 0.989 0.990
Urdu 0.941 0.004 0.969 0.941 0.955 0.949 0.968
Roman 0.931 0.004 0.969 0.931 0.949 0.943 0.963
Weighted
Average
0.970 0.004 0.973 0.970 0.971 0.967 0.983
10 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri
Though Table 2 shows encouraging results but still some of the handwritten
text lines are misclassified during the experimentation. The main reasons for
the same are: (a) presence of speckled noise, (b) skewed words present in some
text lines, and (c) occurrence of irregular spaces within text words, punctuation
symbols, etc. The structural resemblance in the character set of some of the
Indic scripts like Kannada and Telugu as well as Malayalam and Tamil causes
similarity in the contiguous pixel distribution which in turns misclassifies them
among each other. Fig. 4 shows some samples of misclassified text line images.
Figure 4. Samples of text line images written in (a) Kannada, (b) Telugu, (c) Malayalam, and (d)
Tamil scripts misclassified as Telugu, Kannada, Tamil and Malayalam scripts respectively
Conclusion
In this paper, we have proposed a robust method for handwritten script identifi-
cation at text line-level for eight official scripts of India. The aim of this paper is
to facilitate the research of multilingual handwritten OCR. A set of 54 feature
values are extracted using the combination of CCH and DFT. Experimental re-
sults have shown that an accuracy rate of 97.03% is achieved using SVM classi-
fier with limited dataset of eight different scripts which is quite acceptable tak-
ing the complexities and shape variations of the scripts under consideration. In
our future endeavor, we plan to modify this technique to perform the script
identification from handwritten document images containing more number of
Indian languages. Another focus is to increase the size of the database to incor-
Handwritten Script Identification from Text Lines | 11
porate larger variations of writing styles which in turn would establish our
technique as writer independent.
Acknowledgment
The authors are thankful to the Center for Microprocessor Application for Train-
ing Education and Research (CMATER) and Project on Storage Retrieval and
Understanding of Video for Multimedia (SRUVM) of Computer Science and En-
gineering Department, Jadavpur University, for providing infrastructure facili-
ties during progress of the work. The current work, reported here, has been
partially funded by University with Potential for Excellence (UPE), Phase-II,
UGC, Government of India.
References
1 P.K. Singh, R. Sarkar, M. Nasipuri: “Offline Script Identification from Multilingual Indic-
script Documents: A state-of-the-art”, In: Computer Science Review (Elsevier), vol. 15-16,
pp. 1-28, 2015.
2 G. D. Joshi, S. Garg, J. Sivaswamy, “Script Identification from Indian Documents”, In: Lec-
ture Notes in Computer Science: International Workshop Document Analysis Systems, Nel-
son, LNCS-3872, pp. 255-267, Feb. 2006.
3 M. C. Padma, P. A. Vijaya, “Identification of Telugu, Devnagari and English scripts using
discriminating features”, In: International Journal of Computer Science and Information
Technology (IJCSIT), vol. 1, no.2, Nov.2009.
4 M. C. Padma, P. A. Vijaya, “Script Identification from Trilingual Documents using Profile
based Features”, In: International Journal of Computer Science and Applications (IJCSA),
vol. 7, no. 4, pp. 16-33, 2010.
5 R. Gopakumar, N. V. SubbaReddy, K. Makkithaya, U. Dinesh Acharya, “Script Identification
from Multilingual Indian documents using Structural Features”, In: Journal of Computing,
vol. 2, issue 7, pp. 106-111, July 2010.
6 M. Jindal, N. Hemrajani, “Script Identification for printed document images at text-line level
using DCT and PCA”, In: IOSR Journal of Computer Engineering, vol. 12, issue 5, pp. 97-102,
2013.
7 R. Rani, R. Dhir, G. S. Lehal, “Gabor Features Based Script Identification of Lines within a
Bilingual/Trilingual Document”, In: International Journal of Advanced Science and Tech-
nology, vol. 66, pp. 1-12, 2014.
8 I. Kaur, S. Mahajan, “Bilingual Script Identification of Printed Text Image”, In: International
Journal of Engineering and Technology, vol. 2, issue 3, pp. 768-773, June 2015.
9 M. Hangarge, B. V. Dhandra, “Offline Handwritten Script Identification in Document Imag-
es”, In: International Journal of Computer Applications (IJCA), vol.4, no.6, pp. 1-5, July 2010.
12 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri
10 P. K. Singh, R. Sarkar, M. Nasipuri, “Line-level Script Identification for six handwritten
scripts using texture based features”, In: Proc. of 2nd
Information Systems Design and Intel-
ligent Applications, AISC, vol. 340, pp. 285-293, 2015.
11 R. C. Gonzalez, R. E. Woods, “Digital Image Processing”, vol. I. Prentice-Hall, India (1992).
12 N. Ostu, “A thresholding selection method from gray-level histogram”, In: IEEE Transac-
tions on Systems Man Cybernetics, SMC-8, pp. 62-66, 1978.
13 R. M. Haralick, “A Measure of Circularity of Digital Figures”, In: IEEE Transactions on Sys-
tems, Man and Cybernetics, vol. SMC-4, pp. 394-396, 1974.
Neelotpal Chakraborty1
, Samir Malakar2
, Ram Sarkar3
and
Mita Nasipuri4
A Rule based Approach for Noun Phrase
Extraction from English Text Document
Abstract: This paper is an attempt to focus on an approach that is quite simple to
implement and efficient enough to extract Noun Phrases (NPs) from text docu-
ment written in English. The selected text documents are articles of reputed Eng-
lish newspapers of India, namely, The Times of India, The Telegraph and The
Statesman. A specific column (sports) has been taken into consideration. The pro-
posed approach concentrates on the following objectives: First, to explore and
exploit the grammatical features of the language. Second, to prepare an updated
stop list classified into conjunctions, prepositions, articles, common verbs and
adjectives. Third, to give special characters due importance.
Keywords: Noun Phrase, Rule-based Approach, Natural Language Processing,
Data Mining
1 Introduction
In the past few decades, world has witnessed a huge text data explosion in the
form of printed and/or handwritten form. This data growth would increase expo-
nentially as time will pass. Also it is well known paradigm that the searching time
for some document is directly proportional to the size of the database where it
belongs to. Therefore, such abrupt increase of data is eventually increasing the
searching time. But the technology enabled society demands a fast and efficient
way to reduce the searching time. The searching time can be optimized only when
||
1 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
neelotpal_chakraborty@yahoo.com
2 Department of Master of Computer Applications, MCKV Institute of Engineering, Howrah, In-
dia
malakarkarsamir@gmail.com
3 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
raamsarkar@gmail.com
4 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
mitanasipuri@gmail.com
14 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
the data are maintained using proper some structure. One of the ways to accom-
plish this is the document clustering which is an application of Natural Language
Processing (NLP). The job of NLP is to understand and analyze the Natural Lan-
guage (the language spoken by humans). The increasing nature of documents
motivates a section of the research fraternity throughout the world to direct their
research into NLP. The process of document clustering is carried out through se-
quential processes comprises of Noun Phrase (NP) extraction, Key Phrase (KP) se-
lection and document ranking.
1.1 Noun Phrase, Key Phrase and Document Clustering
Any text document irrespective of the content comprises of certain terminology
(words or phrases) using which, out of several documents, that particular docu-
ment can be identified (or classified) as describing a particular subject or topic.
The process of assignment of any text document into a predefined class or subject
is known as document clustering. The terminologies used for tagging the text
document into a predefined class are usually termed as Keyword or KP which is
comprised of single / multiple word(s). Traditionally, a Named Entity (NE), a spe-
cial type of NP, is an obvious choice of KP.
The research approaches on document clustering, till date, have mainly fo-
cused on developing statistical model to identify NPs [1] i.e., identifying quanti-
tative features [1]. However, there are certain aspects of any natural language that
requires understanding of the subjective/qualitative features of that language.
Each particular natural language has its own specific grammatical structure. In
general, the vocabulary of the same can be classified into two types, entitled as
closed class type and open class type [1]. In first category, new words are added
frequently, whereas words are rarely added in the other type.
The NPs fall under the open class, and the prepositions, articles, conjunc-
tions, certain common verbs, adjectives, adverbs are of closed class types. How-
ever, certain verbs, adverbs, adjectives are derived from the NPs, example:
( ) → ( ). Also the appearance of preposition
before a particular word determines the type of words. For example, consider the
following sentences where the word “take” has been used as noun in the first
sentence and as verb in the second sentence.
Sentence 1: Just one take is enough for this scene.
Sentence 2: I have come to take my books.
A Rule based Approach for Noun Phrase Extraction from English Text Document | 15
However, it is worth noting that some verbs, adverbs or adjectives can be derived
from the nouns and vice versa. The English language also comprises of upper-
case/lowercase letters that add to relevancy of terminology for any particular text
document. Therefore in the present work, apart from maintaining a different stop
list at different level, conversion of these specific words into their respective noun
forms is conducted and then they have been considered for NP extraction. Again,
it is well known that human brain possesses the capability of understand the sub-
jective or aesthetic features of a natural language. But they might not always be
dealt by some statistical or probabilistic models. Therefore, these characteristics
of natural language deserve special consideration. The proposed work is devel-
oped to extract NPs from text document considering the aesthetic features of the
English language.
2 Related Work
A number of works [2-19] found in literature aims to extract NPs from text docu-
ment. These works can broadly be classified in three categories such as Rule
Based, Statistical Model Based and Hybrid Approach. The first category of works
[2-7] is mainly employed if adequate data is unavailable. It uses the linguistic
model of the language. The second category of works [1, 8-13] does not require
linguistic information of language. They are language independent and need suf-
ficient data for its successful execution. The third category of works [14-19] de-
scribes some hybrid approaches where linguistic features of language along with
the statistical information are used for extraction of NPs. The present work be-
longs to first category of work.
Rule based approaches [2-7] have mainly used two different approaches: top-
down [2-3] and bottom-up [4-7] approach. In the first category of works, the sen-
tences are divided in continual way and the NPs get extracted whereas in the sec-
ond category of works, words, the fundamental unit of sentence, are extracted
first and then rules are applied to form the phrasal forms. The work in [2] has
considered 7 word length adaptive windows to extract NPs. The approach in [3]
is based on sequence labeling and training by kernel methods that captures the
non linear relationships of the morphological features of Tamil language. In the
work [4], the authors have used morpheme based augmented transition network
to construct and detect the NPs form words. The work described in [5] has used
CRF-based mechanism with morphological and contextual features. Another
method mentioned in [6] has extracted the words from the sentence first and then
16 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
uses Finite State Machine to combine the words to form NPs using Marathi lan-
guage Morphology. N-gram based machine translation mechanism has been ap-
plied in [7] to extract NPs form English and French language.
In statistical / quantitative models, the words are first extracted and then
they are combined to form phrase using some probabilistic model using
knowledge from large scale of data. The work described in [1] has used Hidden
Markov Model (HMM) to extract NPs. Probabilistic finite-state automaton has
been used in [8]. A Support Vector Machine (SVM) based method to perform NP
extraction has been used in [9]. The work as described in [10] has used a statistical
natural language parser trained on a nonmedical domain words as a NP extrac-
tor. Feed-forward neural network with embedding, hidden and softmax layers
[11], long short-term memory (LSTM) recurrent neural networks [12] and multi-
word Expressions using Semantic Clustering [13] have been introduced in to
parse the sentence and tag the corresponding NPs therein.
The methods belonging of hybrid approach exploits some rule-based ap-
proach to create a tagged corpus first and then uses some statistical model to con-
firm as NPs or vice-versa. The works as described in [15-16] uses Part of speech
tagger to create tagged corpus and then used Artificial Immune Systems (IAS) to
confirm final list of NPs for English and Malayalam Language respectively
whereas the work mentioned in [17] has employed handmade rule for corpus
preparation and then memory based training rule for NP extraction from Japa-
nese language. The method [18] first exploits rule-based approach to create a
tagged corpus for training and then a multilayer perceptron (MLP) based neural
network and Fuzzy C-Means clustering have been used. Ref. [19] first employed
HMM to extract the NPs in initial level then has used rule to purify the final result.
3 Corpus Description
The corpus is prepared here to conduct experiment on NP extraction from English
text document. The text documents comprises of news articles from the sports
column of different well known English News papers. 50 such articles are col-
lected from popular English newspapers in India namely, The Telegraph, The
Times of India and The Statesman. The distribution of the document paper wise
is shown in Fig. 1. The database contains total 20378 words. A stop list has been
prepared to include words that are highly frequent in all text documents. The stop
list includes 49 prepositions, 26 conjunctions, 3 articles, 6 clitics and 682 other
stop words that includes common verbs, adjectives, adverbs, etc.
A Rule based Approach for Noun Phrase Extraction from English Text Document | 17
Fig 1. Distribution of the collected data from different newspapers
4 English Morphology
Morphology [1] for a particular language describes a way by which small mean-
ingful units (morphemes) collectively generate words. For example, the mor-
phemes ball and s together make up the word balls. Similarly, the word players is
made up of three morphemes play, er and s. Morphemes can be broadly classified
into two major classes. They are stem and affix. The main/fundamental meaning
of the word is carried by its stem and affix provides the additional meanings to
the word. Affixes can further be categorized as prefix (precedes a stem e.g., −
), suffix (follows a stem e.g., − ), infix (within stem e.g., −
− ) and circumfix (stem is in the middle e.g., en − light − en ).
English morphological methods are classified into 4 major types: Inflectional,
Derivational, Cliticization and Compounding which along with additional termi-
nology (ies) are detailed in the following subsections. The morphological proces-
ses are concatenative in nature.
4.1 Inflectional Morphology
A stem combines with a grammatical morpheme to generate a word with the same
stem class and syntactic function like agreement (see Agreement section). Plural
form of noun is usually formed by adding s or es as suffix to its singular form.
English comprises a relatively small number of possible inflectional affixes, i.e.
only nouns, verbs, and certain adjectives can be inflected. Table 1 contains some
18 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
examples of Noun inflection in the form of regular (having suffix –s or -es) and
irregular (having different spelling to form new word) plurals.
Table 1. Regular and Irregular plurals
Morphological Class Regular Irregular
Singular form player ball man child
Plural form players balls men children
On the other hand, the verbal inflection in English is more complex rather
than Noun inflection. The English language contains generally three types verbs
like main verbs (e.g., bowl, play), modal verbs (e.g., shall, can), and primary verbs
(e.g., has, be). The majority of main verbs are regular since by knowing their stem
only, one can form their other forms easily by concatenating suffixes like -s, -ed,
-ing. However, the irregular verbs have idiosyncratic inflectional forms. Some ev-
ident of such morphological forms for regular / irregular verbs is depicted in Ta-
ble 2.
Table 2. Morphological forms of Regular / Irregular verbs
Morphological Class Regular Verb Inflection Irregular Verb Inflection
Stem play kick catch hit go
Singular form plays kicks catches hits goes
Present participle form playing kicking
cat-
ching
hit-
ting
going
Past form played kicked caught hit went
Present / Past participle form played kicked caught hit gone
4.2 Derivational Morphology
In Derivational Morphology a stem combines with a grammatical morpheme to
generate a word of different class. Its class belongingness is difficult to determine
in automatic way. It is quite complex than inflection. In English, it is often found
that new nouns can be derived from verbs or adjectives. Such kind of derivations
is termed as nominalization [1]. Some examples of such types of derivational
nouns are depicted in Table 3.
A Rule based Approach for Noun Phrase Extraction from English Text Document | 19
Table 3. Example of Different Derivations
Stem Stem Type Suffix
Derived into
Noun Adjective
organization Noun -al - Organizational
spine Noun -less - Spineless
modernize Verb -ation modernization
-
appoint Verb -ee appointee
-
bowl Verb -er bowler
-
depend Verb -able - Dependable
sharp Adjective -ness sharpness
4.3 Cliticization
A stem is combined with a clitic, reduced form of a syntactic word like morpheme
(e.g., have is reduced to ‘ve’), is termed as Cliticization in English morphology.
The new string or word thus formed often acts like a pronoun, article, conjunc-
tion, or verbs. Clitics may precede or follow a word. In the former case, it is called
a proclitic and in the latter case it is called an enclitic. Some examples are depicted
in Table 4.
Table 4. Examples of Clitics and their full forms
Actual Form am are have not will
Clitic form ‘m ‘re ‘ve n't ‘ll
In English, usage of clitics is often ambiguous. For example, he’d can be ex-
panded to he had or he would. However, the apostrophe simplifies the proper seg-
mentation of English clitic.
20 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
4.4 Compounding
Multiple stems are sometime combined to generate a new word. The word oversee
is generated by combining the stems over and see.
4.5 Agreement
In English language, the noun and main verb are required to agree in numbers.
Hence, plural markings are important. These markings are also required to sig-
nify the gender. English has the masculine and feminine genders to represent
male and female respectively. Other genders include any object or thing that can-
not generalize into male or female. When the class number is very large, they are
usually referred to as noun classes instead of gender.
5 Proposed System
The work as described here is a rule based mechanism to extract NPs from text
document. At first it accepts the whole text document and then extracts sen-
tence(s) from it. The extracted sentences based on full stop as delimiter and are
passed through two modules. The first module is Phrase extractor which splits
each sentence into a number of simple sentence like phrases and then it contin-
ues to extract fundamental phrases. The splitting delimiters are punctuation and
bracket symbols, conjunctions, prepositions and other stop words. The final list
of phrases becomes the input to the second module. The second module is de-
signed to finalize the NPs from the set of phrases. Therefore, the present work
uses top-down approach to extract NPs. The modules are detailed in the following
subsections.
5.1 Phrase Extractor
In this phase, a text document is broken down into a list of phrases. It first split
the sentences into simple sentence like phrases. Then it continues to extract fun-
damental phrases form it. It is also noteworthy, ambiguity is found for some stop
words. For example, Jammu and Kashmir, a name of an Indian state, where and
cannot be considered as conjunction since it is used to join two nouns. Also, the
issue of uppercase and lowercase letters is quite prevalent in English language.
The first letter in a sentence is in uppercase form in most of the cases. Obviously,
a sentence may begin with a stop word or number or symbol. It may not be a
A Rule based Approach for Noun Phrase Extraction from English Text Document | 21
noun. Other such issues get addressed here during simple sentence like phrase
extraction and also for NP selection (described in NP Finalization section). The
detail mechanism of phrase extraction is described in Algorithm 1.
Algorithm 1:
Input: A text document
Output: List of NEs
Step 1: Extract sentences from text document using full stop as delimiter
Step 2: If sentence’s first character = Upper case letter
{
If (first word is a stop word)
{
Change the first character to lower case
}
Else
{
No change
}
}
Step 3: Split each sentence into sub sentences using punctuation and
bracket symbols as delimiters.
Step 4: Split each sub sentence into parts using conjunction as delim-
iter
If conjunction is ‘and’:
If string before ‘and’ has verb/verb phrase
{
If string after ‘and’ has verb/verb phrase
{
Split using ‘and’ as delimiter
}
Else
{
No change
}
}
Else
{
No change
22 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
}
Step 5: Split each part into sub-parts using preposition as delimiter
Step 6: Split sub-parts using clitics to get sub sub-parts
Step 7: Split sub sub-parts into phrases using other stop words such as
common verbs, common adjectives and adverbs, pronouns as
delimiters.
5.2 NP Finalization
The list obtained the first module (Phrase Extractor) may contain phrases that
have some stop words attached to it either at the beginning or at the end. Further-
more, the phrase itself may be a non NP. There a purifying mechanism has been
designed to confirm the final list of NPs from phrase list. The mechanism is de-
scribed in Algorithm 2.
Algorithm 2:
Input: List of phrases
Output: List of NPs
For (all phrases in the list)
{
Step 1: If phrase contains only stop word(s) or unwanted symbol(s),
delete phrase.
Step 2: If phrase starts with an article or upper case letter, no change.
Step 3: If phrase contains stop word or unwanted symbol at the begin-
ning or end, prune it from the phrase.
Step 4: If word in a phrase has suffices and no capital letter at word’s
first position split the phrase.
}
6 Result and Discussion
For experimental purpose 50 text document from 3 popular News paper has been
collected. The detail of data has already been described in Corpus Description
section. The intermediate and final result of the designed process is described
using a text line as example which is described using an example. Note that we
are not considering the articles or pronouns.
A Rule based Approach for Noun Phrase Extraction from English Text Document | 23
The quantitative result of the described mechanism has been prepared in manual
way. All the valid NPs from the documents have been selected first. The designed
process is employed on the same text document. Finally, human generated list of
NPs is compared with the NP list generated by the proposed system. The final
result is quantizing using the statistical measures like recall, precision and F-
measure. The average recall, precision and F-measure for these 50 text docu-
ments are 97%, 74% and 84% respectively. Sample result of the same is given in
Table 5.
Example
Sentence: The former Australian batsman has been a part of the South African
support staff during the World Cup in Australia and New Zealand, and T20 captain
Faf Du Plessis feels that his presence will only help youngsters.
Fig 2. Successive breaking/splitting of sample text/sentence to get NPs
Desired result (NP List) Result from the proposed system
1. former Australian batsman
2. part
3. South African support staff
4. World Cup
5. Australia and New Zealand
6. T20 captain Faf Du Plessis
7. presence
8. youngsters.
1. former Australian batsman
2. South African support staff
3. World Cup
4. Australia and New Zealand
5. T20 captain Faf Du Plessis
6. presence
7. youngsters
24 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
Table 5. Depiction of detail result for 5 sample text documents
Document
# # of Words TP FP FN Precision Recall F-measure
1 692 165 57 0 0.743 1 0.853
2 572 111 54 2 0.673 0.982 0.799
3 308 94 20 1 0.825 0.989 0.897
4 175 35 14 1 0.714 0.972 0.823
8 280 84 16 0 0.840 1 0.913
Conclusion
Any natural language follows certain rules or grammar. Although the rules may
vary from language to language, still most languages currently being communi-
cated (by significant number of humans) have more or less the same grammatical
syntax and structure. The present work proposes a mechanism to extract NPs
from English text documents using these rules. The English morphological rules
are considered here. The algorithm is extremely simple and although it may seem
rather primitive in nature, the method provides some vital benefits since it in-
cludes some subjective or aesthetic features of a natural language. Therefore the
proposed system has its tendency towards universality. Also, the mechanism ex-
tracts the NPs in admissible time. The average recall, precision and F-measure for
these 50 English text documents are 97%, 74% and 84% respectively.
In English, the word count is 1.2 billion and still counting since English has
many words derived from various languages namely, Latin, Sanskrit, French, etc.
as a result, the number of nouns may also increase. This approach uses storage
of a significant number of stop words but this stop list is not ultimate. So, number
of stop words may constrain overall performance. Incorporating more composite
rules to phrase extraction can enhance the model.
References
1 Daniel Jurafsky and James H. Martin, “Speech and Language Processing: An Introduction to
Natural Language Processing, Computational Linguistics and Speech Recognition”, Pear-
son, 2nd
Edition.
2 Bennett, Nuala A., et al. "Extracting noun phrases for all of MEDLINE." Proceedings of the
AMIA Symposium. American Medical Informatics Association, 1999.
A Rule based Approach for Noun Phrase Extraction from English Text Document | 25
3 Dhivya, R., Dhanalakshmi, V., Kumar, M. A., & Soman, K. P. (2012). Clause boundary identi-
fication for tamil language using dependency parsing. In Signal Processing and Information
Technology (pp. 195-197). Springer Berlin Heidelberg.
4 Nair, L. R., & Peter, S. D. (2011, October). Shallow parser for Malayalam language using finite
state cascades. In Biomedical Engineering and Informatics (BMEI), 2011 4th International
Conference on (Vol. 3, pp. 1264-1267). IEEE.
5 El-Kahlout, I. D., & Akın, A. A. (2013). Turkish constituent chunking with morphological and
contextual features. In Computational Linguistics and Intelligent Text Processing (pp. 270-
281). Springer Berlin Heidelberg.
6 Bapat, M., Gune, H., & Bhattacharyya, P. (2010, August). A paradigm-based finite state mor-
phological analyzer for Marathi. In Proceedings of the 1st Workshop on South and Southeast
Asian Natural Language Processing (WSSANLP) (pp. 26-34).
7 Marino, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A., & Costa-
Jussà, M. R. (2006). N-gram-based machine translation. Computational Linguistics, 32(4),
527-549.
8 Serrano, J. I., & Araujo, L. (2005, September). Evolutionary algorithm for noun phrase detec-
tion in natural language processing. In Evolutionary Computation, 2005. The 2005 IEEE Con-
gress on (Vol. 1, pp. 640-647). IEEE.
9 Dhanalakshmi, V., & Rajendran, S. (2010). Natural Language processing Tools for Tamil
grammar Learning and Teaching. International journal of Computer Applications (0975-
8887), 8(14).
10 Huang, Y., Lowe, H. J., Klein, D., & Cucina, R. J. (2005). Improved identification of noun
phrases in clinical radiology reports using a high-performance statistical natural language
parser augmented with the UMLS specialist lexicon. Journal of the American Medical Infor-
matics Association, 12(3), 275-285.
11 Coppola, C. A. D. W. G., & Petrov, S. Improved Transition-Based Parsing and Tagging with
Neural Networks.
12 Ballesteros, M., Dyer, C., & Smith, N. A. (2015). Improved transition-based parsing by mod-
eling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657.
13 Chakraborty, Tanmoy, Dipankar Das, and Sivaji Bandyopadhyay. "Identifying Bengali Multi-
word Expressions using Semantic Clustering" Lingvisticæ Investigationes 37.1 (2014): 106-
128.
14 Pattabhi R K Rao T, Vijay Sundar Ram R, Vijayakrishna R and Sobha L, “A Text Chunker and
Hybrid POS Tagger for Indian Languages”, Proceedings of IJCAI-2007, SPSAL-2007.
15 Kumar, A., & Nair, S. B. (2007). An artificial immune system based approach for English
grammar checking. In Artificial Immune Systems (pp. 348-357). Springer Berlin Heidelberg.
16 Bindu, M. S., & Idicula, S. M. (2011). A Hybrid Model For Phrase Chunking Employing Artificial
Immunity System And Rule Based Methods. International Journal of Artificial Intelligence &
Applications, 2(4), 95.
17 Park, S. B., & Zhang, B. T. (2003, July). Text chunking by combining hand-crafted rules and
memory-based learning. In Proceedings of the 41st Annual Meeting on Association for Com-
putational Linguistics-Volume 1 (pp. 497-504). Association for Computational Linguistics.
18 Kian, S., Akhavan, T., & Shamsfard, M. (2009, October). Developing a persian chunker using
a hybrid approach. In Computer Science and Information Technology, 2009. IMCSIT'09. In-
ternational Multiconference on (pp. 227-234). IEEE.
26 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri
19 Ibrahim, A., & Assabie, Y. (2013). Hierarchical Amharic Base Phrase Chunking Using HMM
With Error Pruning. In Proceedings of the 6th Conference on Language and Technology, Poz-
nan, Poland (pp. 328-332).
Jaya Gera1
and Harmeet Kaur2
Recommending Investors using Association
Rule Mining for Crowd Funding Projects
Abstract: Many projects fail to meet their funding goal due to lack of sufficient
funders. Crowd Funders are the key component of crowdfunding phenomenon.
Their monetary support makes a project’s success possible. Their decision is
based on project’s quality and their own interests. In this paper, we aim to pro-
mote projects by recommending promising projects to potential funders so as to
help projects meet their goal. We have developed a recommendation model that
learns funders’ interests and recommends promising projects that match their
profiles. A profile is generated using funders backing history. This experiment is
conducted using Kickstarter dataset. Projects are analysed on several aspects:
various project features, funding pattern during its funding cycle, success prob-
ability etc. Initially, recommendations are generated by mining funders’ history
using association rule. As few backers have backed multiple projects, data is
sparse. Though, association rule mining is quiet efficient and generates im-
portant rules but is not able to promote promising projects. So, recommendations
are refined by identifying promising projects on the basis of percentage funding
received, pledge behaviour and success probability.
Keywords: crowdfunding; recommender systems; association rule mining; user
interest; success probability; pledge behaviour.
1 Introduction
One of the most challenging tasks for setting up a new venture is to arrange suf-
ficient funds. Although crowdfunding has emerged as viable alternative solution
for raising funds for new ventures; not all of them are successful to raise sufficient
funds. One of the most common reasons for failure is that venture initiators are
novice and have difficulty in understanding and leveraging their social network
||
1 Department of Computer Science, Shyama Prasad Mukherji College, University of Delhi, Delhi,
India
jayagera@spm.du.ac.in
2 Department of Computer Science, Hans Raj College, University of Delhi, Delhi, India
hkaur@hrc.du.ac.in
28 | Jaya Gera and Harmeet Kaur
to reach to correct audience for their product [1][2]. Audience is diverse in nature
and spread across the globe. Audience is not just consumer of product but turn-
ing to the role of wise investors/funders [3]. Crowd funders not only provide mon-
etary support but also motivate and influence other funders’ decision and help in
promoting projects that is essential for projects' success. Capturing wisdom of
funders and finding funders matching with the project profile cannot be done by
the project initiator or creator alone. This leads to requirement of emergence of
crowdfunding intermediators or crowdfunding platforms.
Crowdfunding Platforms act as intermediators between project initiators and
potential funders [4]. It provides certain functionality and acts as an electronic
matching market that overcomes information asymmetry and costs [5]. Their ob-
jective is to maximize the number of successful projects [6]. To achieve this, they
need to design policies and strategies to motivate funders to fund [7] so that the
site can get more projects funded. This can be achieved by analysing funders’
trend and timing of contribution and via coordination among them [7].
With the increase in popularity of crowdfunding, crowdfunding platforms
have also grown like mushrooms. Most platforms do not do more than providing
a platform to present the projects and a mechanism for online payment to collect
pledge [8]. But, some of them do provide value added services such as provide
suggestions [8], help in expanding network [6], building trust and much more.
The efforts put by crowdfunding platforms have positive influence and help cre-
ators in raising funds.
Figure 1. source: https://www.kickstarter.com/
Some also assess and promote projects on their sites, for example, Kickstarter
platform promotes projects in various ways: staffs pick projects, project of the
Recommending Investors using Association Rule Mining for Crowd Funding projects | 29
day, popular projects, allows to discover projects popular among friend circle etc.
Figure 1 shows one such snapshot of Kickstarter (retrieved on 5 Jan 2016).
In Literature, attention has not been paid towards matching projects and fun-
ders to promote them among suitable funders. Proposed work has developed a
method that generates rules using backers’ funding pattern, learns funders’ pro-
file and trend of funding and matches projects with profile and recommends them
to funders. The aim is to assist project initiator in raising funds and to improve
performance and add functionality to the platforms. Rest of paper discusses liter-
ature work, then dataset and its characteristics, followed by proposed work and
conclusion.
2 Related Work
Though, crowdfunding is now a mature domain, understanding about dynamics
of crowdfunding platform is lacking [4]. Most of the literature work focuses on
analysing various factors and their impact on success of crowdfunding projects,
role of social media and geography, impact of social network size on success, mo-
tivation behind investment decisions, effect of timing and coordination of inves-
tors etc. However, less emphasis is given on understanding role of various plat-
forms, ways of making policies, understanding and adding to existing
functionalities. Some researchers have paid attention to these dimensions and
brought new insights about crowdfunding intermediators.
Ref. [5] developed an empirical taxonomy of crowd funding platform that
characterizes various crowdfunding intermediation models on the basis of He-
donism, Altruism, and Profit. They also focused on how crowdfunding interme-
diaries manage financial intermediation and how do they transform relations be-
tween initiator and funder in two-sided online markets.
Ref. [9] proposed different ways of recommending investors based on twitter
data. Recommendation is generated on the basis of pledge behaviour of frequent
and occasional investors. Research suggested that frequent investors are at-
tracted by ambitious projects whereas occasional investors act as donors and
mainly invest in art related projects.
Ref. [10] categorized Kickstarter projects’ features as social, temporal and ge-
ographical features and analysed these features’ impact on project success. This
analysis also build recommendation model using gradient boosting tree that uses
these features to recommend set of backers to Kickstarter projects.
30 | Jaya Gera and Harmeet Kaur
Ref. [7] analysed donors’ contribution, their timing of contribution and coor-
dination among donors and impact on funding of projects. Ref. [11] suggests do-
nors funding decision play an important role in the ultimate success of a crowd-
funding project. Potential donors see the level of support from other project
backers as well as their timing before making their own funding decision. Ref. [12]
observed temporal distribution of customer interest and concluded that there ex-
ist strong correlation between a crowd funding projects early promotional activi-
ties and its final outcome. They also discussed importance of concurrent promo-
tion of projects from multiple sources.
Ref. [1] revealed that interacting and connecting with certain key individuals
provide advantage of acquiring resources and spreading information about pro-
jects. This study also disclosed that a small portion of fund comes from strong ties
(family, friends, colleagues, relatives etc.) but large portion of funds come from
weak ties i.e. from people on network whom creator rarely met or interacted with.
Ref. [5] also suggested that matching projects with its potential investors enables
successful funding.
Various research studies suggest that crowdfunding market is growing fast
in all respects i.e. volume of projects launched every day, number of funders turn-
ing up and number of platforms rising up. But an increase in volume does not
mean increase in performance. So, there is need to develop a mechanism to
match projects and funders. The contributions of this work are:
i. Add to platform functionality by automatically matching projects with
potential funders
ii. Understanding funders and their interests by maintaining their profiles
iii. Assist initiators in evaluating success prospects of their projects
3 Data Set
This experiment is conducted on Kickstarter data. This dataset consists of data
about projects, funding history of projects and project backers. Projects and their
funding history are obtained from kickspy1
website and backers’ data for each of
the project in this dataset is obtained by crawling backers’ pages from kickstarter2
website. This dataset consists of data of 4862 projects launched in the month of
April 2014 and backing history of 97,608 backers who backed these projects. Pro-
ject data includes project id, name of project, goal amount, pledged amount, sta-
tus, category, subcategory, duration, rewards, facebook friends, facebook
shares, start date, end date, duration etc. Pledge data consists of amount pledged
on each day during funding cycles by these projects. This dataset also contains
Recommending Investors using Association Rule Mining for Crowd Funding projects | 31
data of live, suspended and cancelled projects. These projects and their backing
transactions are removed for analysis purpose. After removing these projects, da-
taset is left with 4,121 projects and 92,770 backers. Out of 4,121 projects, 1,899
(46%) are successful and 2,232 (54%) are unsuccessful.
To have better understanding of individual project characteristics, Mean
Value Analysis is done. Table 1 lists mean value for some of project characteris-
tics.
Table 1. Mean Value Analysis
All Successful Unsuccessful
Projects 4,121
1,899
(46%)
2,232
(54%)
Goal Amount 54537.69 10882.21 91484.46
Pledged Amount 11393.78 22166.31 2276.70
Backers 139.70 272.80 27.06
Rewards 9.82 11.30 8.57
Updates 3.18 5.25 1.42
Comments 19.62 38.95 3.26
Duration 31.81 30.14 33.21
Facebook Friends 711.25 823.66 613.07
In nutshell, successful campaigns on an average have low goal to achieve,
less duration, a significantly large number of funders and facebook friends to
support, offers a good number of rewards and have a better interaction between
creators and funders through updates and comments.
4 Proposed work
There are a large number of projects that are unable to complete because they fail
to publicize and attract sufficient number of funders. For a project to be success-
ful, it must reach its funding goal. To reach its goal, there should be sufficient
number of investors, who are willing to invest and take risk. Research reveals that
20-40% of initial funding comes from family and friend [13]. But, a large number
of funders are unknown to the creator and fund for various reasons. With the
1 http://www.kickspy.com. This web site is currently shut down.
2 https://www.kickstarter.com/ * Backers’ page has now been removed by Kickstarter website.
32 | Jaya Gera and Harmeet Kaur
growth of technology and security aspects, large number of creators as well fun-
ders are participating online. These funds are small in amount and spread across
various projects [14]. As the funding amounts are not very large and come from
large network of unknown people, there is a need to coordinate investors funding
[14] to have more number of successful projects. To assist potential funders, we
have developed a recommender model that learns through funders backing his-
tory using association rule mining and recommend and promote projects among
potential funders.
4.1 Method
Our aim is to assist initiators, funders as well as platforms such that overall suc-
cess rate of platform is increased and all the stakeholders are benefitted. Some
important issue are: which projects need to be promoted? What criteria should be
used to identify such projects? Projects that signal high quality and popular in
social network get funded soon. Projects that signal low quality raise nothing or
very less. Such projects may not get funded even by friends and family. Projects
that possess good quality and perform well initially but lose their track later on
are the best candidates for promotion. This model identifies such projects by an-
alysing their quality and funding pattern. Fig. 2 shows model components.
Recommendation model has five modules:
i) Predictor
ii) Trend Monitor
iii) Profile Modeller
iv) Rule Generator
v) Recommender
Predictor: Some projects perform well on monetary front and attract large amount
than required. Some projects perform poorly and attract nothing or little mone-
tary investment. Project success is also influenced by project quality [15]. Project
quality is assessed by project preparation and presentation. Assessing true status
of project preparation is not feasible, because creators disclose as much as they
wish to. Crowdfunding suffers from information asymmetry [6] i.e. creator knows
actual situation whereas funder can assess using information disclosed. So, in
this module, project success is evaluated based on project presentation. Project
is characterized by various features such as Category, has video, number of vid-
Recommending Investors using Association Rule Mining for Crowd Funding projects | 33
eos, number of images, goal amount, duration, facebook friends etc. These fea-
tures are good indicator of project quality. Project success is predicted by feeding
these attributes to logistic regression. This module predicts project success with
81.5% accuracy.
Figure 2. Recommender model
Trend Monitor: Predictor’s prediction is based on static features available at the
time of launch of projects such as goal amount, category etc. This does not assess
performance of project after launch. Our aim is to promote projects that are of
good quality but could not raise enough and lack by a little margin. We need to
identify such projects whose project presentations are as good as successful ones
but grow slow during their funding cycle. This can be done by monitoring their
funding behaviour. We need to understand nature of successful and unsuccessful
funding pattern. Successful projects generally grow faster than unsuccessful one.
Pledge analysis [16] states, if a campaign has raised approximately 20% of its goal
within the first 15% of funding cycle, its success probability is high. Unsuccessful
initially starts well but fails to retain this growth after sometime. So, campaigns
that could raise 20% of funds within 20% of funding time are good candidates to
be promoted. Module Trend monitor performs analysis of funding behaviour of
project and identifies such projects.
Profile Modeller: This module learns backer profile by analysing backer’s fund-
ing history. Profile of a backer Bi is defined as
34 | Jaya Gera and Harmeet Kaur
Bi = {Backer_idi, Namei, Locationi, CategoryPrefi}
Each backer is assigned a unique identification i.e. backer_id. Name attribute
contains Name of backer and Location attribute contains address and city of
backer. CategoryPref is generated by scanning backing history and finding cate-
gory and subcategory of each project backed. CategoryPrefi is a set that is a de-
fined as:
CategoryPrefi = {{Catj1, subcatk1, nk1}, {Catj2, subcatk2, nk2}, ... {Catjm, subcatkm,
nkm}}i.e. Backer has supported nka number of projects of subcategory ka of cate-
gory ja.
Rule Generator: Recommender system not only identifies projects to be pro-
moted but also understands trend of backers and learns which projects, backers
are frequently backing. To understand behaviour pattern of backers, we used As-
sociation rule mining technique of data mining. Association rule mining aims to
extract interesting correlations, frequent patterns, associations or casual struc-
tures among sets of items in the transaction databases or other data repositories
[17]. Association rule mining is generally used in Market Basket Analysis. It mines
transactions history and tells which items are frequently bought. As we are inter-
ested in knowing which projects are backed together by different backers, we
have used association rule mining technique.
Rules are generated by applying Apriori algorithm of Association rule mining
technique. Two parameters support and confidence are used to measure interest-
ingness of rules. Rules that satisfy minimum support and minimum confidence
value are refereed as strong association rule and are of interest [17][18]. For asso-
ciation rules of the form X ⇒ Y where X and Y are sets of items, support and con-
fidence formulas are defined as:
⇒ =
ℎ ∧
⇒ =
ℎ ∧
Association rule mining has two phases: i) finding frequent item sets ii) gen-
erating rules. First phase finds itemsets that satisfy minimum support count
value. Second phase generates rule using itemsets that satisfy confidence thresh-
old value.
This dataset consists of list of projects backed by backers. Let us understand
Apriori Algorithm with the help of an example.
Random documents with unrelated
content Scribd suggests to you:
IN THE TEXT
PAGE
Flint Arrow-heads 37
Flint Scrapers 45
A Cooking-pot 46
Flint Scrapers 49
Fragment of Cooking-pot 50
Cross, Whitchurch Down 65
Plan of Hut, Shapley Common 67
Hut Circle, Grimspound 69
Logan Rock. The Rugglestone, Widdecombe 77
Roos Tor Logans 79
Covered Chamber, Whit Tor 100
Construction of Stone and Timber Wall 101
Tin-workings, Nillacombe 109
Mortar-stone, Okeford 111
Slag-pounding Hollows, Gobbetts 113
Smelting in 1556 114
Plan of Blowing-house, Deep Swincombe 115
Tin-mould, Deep Swincombe 117
Smelting Tin in Japan 119
A Primitive Hinge 133
Inscription on Sourton Cross 142
Inscribed Stone, Sticklepath 150
Plan of Stone Rows near Caistor Rock 161
" " Grimspound 166
" " Hut at Grimspound 169
Fragment of Pottery 177
Ornamented Pottery 179
Tom Pearce's Ghostly Mare 191
Crazing-mill Stone, Upper Gobbetts 204
Method of using the Mill-stones 205
Chancel Capital, Meavy 237
Blowing-house below Black Tor 271
DARTMOOR
Communication And Power Engineering R Rajesh Editor B Mathivanan Editor
D
CHAPTER I.
BOGS
The rivers that flow from Dartmoor—The bogs are their
cradles—A tailor lost on the moor—A man in Aune Mire
—Some of the worst bogs—Cranmere Pool—How the
bogs are formed—Adventure in Redmoor Bog—Bog
plants—The buckbean—Sweet gale—Furze—Yellow
broom—Bee-keeping.
artmoor proper consists of that upland region of granite, rising to
nearly 2,000 feet above the sea, and actually shooting above
that height at a few points, which is the nursery of many of the
rivers of Devon.
The Exe, indeed, has its source in Exmoor, and it disdains to receive
any affluents from Dartmoor; and the Torridge takes its rise hard by
the sea at Wellcombe, within a rifle-shot of the Bristol Channel,
nevertheless it makes a graceful sweep—tenders a salute—to
Dartmoor, and in return receives the liberal flow of the Okement.
The Otter and the Axe, being in the far east of the county, rise in the
range of hills that form the natural frontier between Devon and
Somerset.
But all the other considerable streams look back upon Dartmoor as
their mother.
And what a mother! She sends them forth limpid and pure, full of
laughter and leap, of flash and brawl. She does not discharge them
laden with brown mud, as the Exe, nor turned like the waters of
Egypt to blood, as the Creedy.
A prudent mother, she feeds them regularly, and with considerable
deliberation. Her vast bogs act as sponges, absorbing the winter
rains, and only leisurely and prudently does she administer the
hoarded supply, so that the rivers never run dry in the hottest and
most rainless summers.
Of bogs there are two sorts, the great parental peat deposits that
cover the highland, where not too steep for them to lie, and the
swamps in the bottoms formed by the oozings from the hills that
have been arrested from instant discharge into the rivers by the
growth of moss and water-weeds, or are checked by belts of gravel
and boulder. To see the former, a visit should be made to Cranmere
Pool, or to Cut Hill, or Fox Tor Mire. To get into the latter a stroll of
ten minutes up a river-bank will suffice.
The existence of the great parent bogs is due either to the fact that
beneath them lies the impervious granite, as a floor, somewhat
concave, or to the whole rolling upland being covered, as with a
quilt, with equally impervious china-clay, the fine deposit of feldspar
washed from the granite in the course of ages.
In the depths of the moor the peat may be seen riven like floes of
ice, and the rifts are sometimes twelve to fourteen feet deep, cut
through black vegetable matter, the product of decay of plants
through countless generations. If the bottom be sufficiently denuded
it is seen to be white and smooth as a girl's shoulder—the kaolin
that underlies all.
On the hillsides, and in the bottoms, quaking-bogs may be lighted
upon or tumbled into. To light upon them is easy enough, to get out
of one if tumbled into is a difficult matter. They are happily small,
and can be at once recognised by the vivid green pillow of moss that
overlies them. This pillow is sufficiently close in texture and buoyant
to support a man's weight, but it has a mischievous habit of thinning
around the edge, and if the water be stepped into where this fringe
is, it is quite possible for the inexperienced to go under, and be
enabled at his leisure to investigate the lower surface of the covering
duvet of porous moss. Whether he will be able to give to the world
the benefit of his observations may be open to question.
The thing to be done by anyone who gets into such a bog is to
spread his arms out—this will prevent his sinking—and if he cannot
struggle out, to wait, cooling his toes in bog water, till assistance
comes. It is a difficult matter to extricate horses when they flounder
in, as is not infrequently the case in hunting; every plunge sends the
poor beasts in deeper.
One afternoon, in the year 1851, I was in the Walkham valley above
Merrivale Bridge digging into what at the time I fondly believed was
a tumulus, but which I subsequently discovered to be a mound
thrown up for the accommodation of rabbits, when a warren was
contemplated on the slope of Mis Tor.
Towards evening I was startled to see a most extraordinary object
approach me—a man in a draggled, dingy, and disconsolate
condition, hardly able to crawl along. When he came up to me he
burst into tears, and it was some time before I could get his story
from him. He was a tailor of Plymouth, who had left his home to
attend the funeral of a cousin at Sampford Spiney or Walkhampton, I
forget which. At that time there was no railway between Tavistock
and Launceston; communication was by coach.
When the tailor, on the coach, reached Roborough Down, "'Ere you
are!" said the driver. "You go along there, and you can't miss it!"
indicating a direction with his whip.
So the tailor, in his glossy black suit, and with his box-hat set jauntily
on his head, descended from the coach, leaped into the road, his
umbrella, also black, under his arm, and with a composed
countenance started along the road that had been pointed out.
Where and how he missed his way he could not explain, nor can I
guess, but instead of finding himself at the house of mourning, and
partaking there of cake and gin, and dropping a sympathetic tear, he
got up on to Dartmoor, and got—with considerable dexterity—away
from all roads.
He wandered on and on, becoming hungry, feeling the gloss go out
of his new black suit, and raws develop upon his top-hat as it got
knocked against rocks in some of his falls.
Night set in, and, as Homer says, "all the paths were darkened"—but
where the tailor found himself there were no paths to become
obscured. He lay in a bog for some time, unable to extricate himself.
He lost his umbrella, and finally lost his hat. His imagination
conjured up frightful objects; if he did not lose his courage, it was
because, as a tailor, he had none to lose.
He told me incredible tales of the large, glaring-eyed monsters that
had stared at him as he lay in the bog. They were probably sheep,
but as nine tailors fled when a snail put out its horns, no wonder
that this solitary member of the profession was scared at a sheep.
The poor wretch had eaten nothing since the morning of the
preceding day. Happily I had half a Cornish pasty with me, and I
gave it him. He fell on it ravenously.
Then I showed him the way to the little inn at Merrivale Bridge, and
advised him to hire a trap there and get back to Plymouth as quickly
as might be.
"I solemnly swear to you, sir," said he, "nothing will ever induce me
to set foot on Dartmoor again. If I chance to see it from the Hoe, sir,
I'll avert my eyes. How can people think to come here for pleasure—
for pleasure, sir! But there, Chinamen eat birds'-nests. There are
depraved appetites among human beings, and only unwholesome-
minded individuals can love Dartmoor."
There is a story told of one of the nastiest of mires on Dartmoor,
that of Aune Head. A mire, by the way, is a peculiarly watery bog,
that lies at the head of a river. It is its cradle, and a bog is
distributed indiscriminately anywhere.
A mire cannot always be traversed in safety; much depends on the
season. After a dry summer it is possible to tread where it would be
death in winter or after a dropping summer.
A man is said to have been making his way through Aune Mire when
he came on a top-hat reposing, brim downwards, on the sedge. He
gave it a kick, whereupon a voice called out from beneath, "What be
you a-doin' to my 'at?" The man replied, "Be there now a chap
under'n?" "Ees, I reckon," was the reply, "and a hoss under me
likewise."
There is a track through Aune Head Mire that can be taken with
safety by one who knows it.
Fox Tor Mire once bore a very bad name. The only convict who really
got away from Princetown and was not recaptured was last seen
taking a bee-line for Fox Tor Mire. The grappling irons at the disposal
of the prison authorities were insufficient for the search of the whole
marshy tract. Since the mines were started at Whiteworks much has
been done to drain Fox Tor Mire, and to render it safe for grazing
cattle on and about it.
There is a nasty little mire at the head of Redaven Lake, between
West Mill Tor and Yes Tor, and there is a choice collection of them,
inviting the unwary to their chill embraces, on Cater's Beam, about
the sources of the Plym and Blacklane Brook, the ugliest of all
occupying a pan and having no visible outlet. The Redlake mires are
also disposed to be nasty in a wet season, and should be avoided at
all times. Anyone having a fancy to study the mires and explore
them for bog plants will find an elegant selection around Wild Tor, to
be reached by ascending Taw Marsh and mounting Steeperton Tor,
behind which he will find what he desires.
"On the high tableland," says Mr. William Collier, "above
the slopes, even higher than many tors, are the great
bogs, the sources of the rivers. The great northern bog is
a vast tract of very high land, nothing but bog and sedge,
with ravines down which the feeders of the rivers pour.
Here may be found Cranmere Pool, which is now no pool
at all, but just a small piece of bare black bog. Writers of
Dartmoor guide-books have been pleased to make much
of this Cranmere Pool, greatly to the advantage of the
living guides, who take tourists there to stare at a small
bit of black bog, and leave their cards in a receptacle
provided for them. The large bog itself is of interest as the
source of many rivers; but there is absolutely no interest
in Cranmere Pool, which is nothing but a delusion and a
snare for tourists. It was a small pool years ago, where
the rain water lodged; but at Okement Head hard by a fox
was run to ground, a terrier was put in, and by digging
out the terrier Cranmere Pool was tapped, and has never
been a pool since. So much for Cranmere Pool!
"This great northern bog, divided into two sections by Fur
Tor and Fur Tor Cut, extends southwards to within a short
distance of Great Mis Tor, and is a vast receptacle of rain,
which it safely holds throughout the driest summer. Fur
Tor Cut is a passage between the north and south parts of
this great bog, evidently cut artificially for a pass for cattle
and men on horseback from Tay Head, or Tavy Head, to
East Dart Head, forming a pass from west to east over the
very wildest part of Dartmoor. Anyone can walk over the
bogs; there is no danger or difficulty to a man on foot
unless he gets exhausted, as some have done. But horses,
bullocks, and sheep cannot cross them. A man on
horseback must take care where he goes, and this Fur Tor
Cut is for his accommodation."[1]
The Fur Tor Mire is not composed of black but of a horrible yellow
slime. There is no peat in it, and to cross it one must leap from one
tuft of coarse grass to another. The "mires" are formed in basins of
the granite, which were originally lakes or tarns, and into which no
streams fall bringing down detritus. They are slowly and surely filling
with vegetable matter, water-weeds that rot and sink, and as this
vegetable matter accumulates it contracts the area of the water
surface. In the rear of the long sedge grass or bogbean creeps the
heather, and a completely choked-up mire eventuates in a peat bog.
Granite has a tendency to form saucer-like depressions. In the
Bairischer Wald, the range dividing Bavaria from Bohemia, are a
number of picturesque tarns, that look as though they occupied the
craters of extinct volcanoes. This, however, is not the case; the rock
is granite, but in this case the lakes are so deep that they have not
as yet been filled with vegetable deposit. On the Cornish moors is
Dosmare Pool. This is a genuine instance of the lake in a granitic
district. In Redmoor, near Fox Tor, on the same moors, we have a
similar saucer, with a granitic lip, over which it discharges its
superfluous water, but it is already so much choked with vegetable
growth as to have become a mire. Ten thousand years hence it will
be a great peat bog.
I had an adventure in Redmoor, and came nearer looking into the
world beyond than has happened to me before or since. Although it
occurred on the Cornish moors, it might have chanced on Dartmoor,
in one of its mires, for the character of both is the same, and I was
engaged in the same autumn on both sets of moors. Having been
dissatisfied with the Ordnance maps of the Devon and Cornish
moors, and desiring that certain omissions should be corrected, I
appealed to Sir Charles Wilson, of the Survey, and he very readily
sent me one of his staff, Mr. Thomas, to go over the ground with
me, and fill in the particulars that deserved to be added. This was in
1891. The summer had been one of excessive rain, and the bogs
were swollen to bursting. Mr. Thomas and I had been engaged, on
November 5th, about Trewartha Marsh, and as the day closed in we
started for the inhabited land and our lodgings at "Five Janes." But
in the rapidly closing day we went out of our course, and when
nearly dark found ourselves completely astray, and worst of all in a
bog. We were forced to separate, and make our way as best we
could, leaping from one patch of rushes or moss to another. All at
once I went in over my waist, and felt myself being sucked down as
though an octopus had hold of me. I cried out, but Thomas could
neither see me nor assist me had he been able to approach.
Providentially I had a long bamboo, like an alpenstock, in my hand,
and I laid this horizontally on the surface and struggled to raise
myself by it. After some time, and with desperate effort, I got myself
over the bamboo, and was finally able to crawl away like a lizard on
my face. My watch was stopped in my waistcoat pocket, one of my
gaiters torn off by the suction of the bog, and I found that for a
moment I had been submerged even over one shoulder, as it was
wet, and the moss clung to it.
On another occasion I went with two of my children, on a day when
clouds were sweeping across the moor, over Langstone Moor. I was
going to the collection of hut circles opposite Greenaball, on the
shoulder of Mis Tor. Unhappily, we got into the bog at the head of
Peter Tavy Brook. This is by no means a dangerous morass, but after
a rainy season it is a nasty one to cross.
Simultaneously down on us came the fog, dense as cotton wool. For
quite half an hour we were entangled in this absurdly insignificant
bog. In getting about in a mire, the only thing to be done is to leap
from one spot to another where there seems to be sufficient growth
of water-plants and moss to stay one up. In doing this one loses all
idea of direction, and we were, I have no doubt, forming figures of
eight in our endeavours to extricate ourselves. I knew that the
morass was inconsiderable in extent, and that by taking a straight
line it would be easy to get out of it, but in a fog it was not possible
to take a bee-line. Happily, for a moment the curtain of mist lifted,
and I saw on the horizon, standing up boldly, the stones of the great
circle that is planted on the crest. I at once shouted to the children
to follow me, and in two minutes we were on solid land.
The Dartmoor bogs may be explored for rare plants and mosses.
The buckbean will be found and recognised by its three succulent
sea-green leaflets, and by its delicately beautiful white flower tinged
with pink, in June and July. I found it in 1861 in abundance in
Iceland, where it is called Alptar colavr, the swan's clapper. About
Hamburg it is known as the "flower of liberty," and grows only within
the domains of the old Hanseatic Republic. In Iceland it serves a
double purpose. Its thickly interwoven roots are cut and employed in
square pieces like turf or felt as a protection for the backs of horses
that are laden with packs. Moreover, in crossing a bog, the clever
native ponies always know that they can tread safely where they see
the white flower stand aloft.
The golden asphodel is common, and remarkably lovely, with its
shades of yellow from the deep-tinted buds to the paler expanded
flower. The sundew is everywhere that water lodges; the sweet gale
has foliage of a pale yellowish green sprinkled over with dots, which
are resinous glands. The berries also are sprinkled with the same
glands. The plant has a powerful, but fresh and pleasant, odour,
which insects dislike. Country people were wont to use sprigs of it,
like lavender, to put with their linen, and to hang boughs above their
beds. The catkins yield a quantity of wax. The sweet gale was
formerly much more abundant, and was largely employed; it went
by the name of the Devonshire myrtle. When boiled, the wax rises to
the surface of the water. Tapers were made of it, and were so
fragrant while burning, that they were employed in sick-rooms. In
Prussia, at one time, they were constantly furnished for the royal
household.
The marsh helleborine, Epipactis palustris, may be gathered, and the
pyramidal orchis, and butterfly and frog orchises, occasionally.
The furze—only out of bloom when Love is out of tune—keeps away
from the standing water. It is the furze which is the glory of the
moor, with its dazzling gold and its honey breath, fighting for
existence against the farmer who fires it every year, and envelops
Dartmoor in a cloud of smoke from March to June. Why should he
do this instead of employing the young shoots as fodder?
I think that as Scotland has the thistle, Ireland the shamrock, and
Wales the leek as their emblems, we Western men of Devon and
Cornwall should adopt the furze. If we want a day, there is that of
our apostle S. Petrock, on June 4th.
By the streams and rivers and on hedge-banks the yellow broom
blazes, yet it cannot rival in intensity of colour and in variety of tint
the magnificent furze or gorse. But the latter is not a pleasant plant
to walk amidst, owing to its prickles, and especial care must be
observed lest it affix one of these in the knee. The spike rapidly
works inwards and produces intense pain and lameness. The
moment it is felt to be there, the thing to be done is immediately to
extract it with a knife. From the blossoms of the furze the bees
derive their aromatic honey, which makes that of Dartmoor supreme.
Yet beekeeping is a difficulty there, owing to the gales, that sweep
the busy insects away, so that they fail to find their direction home.
Only in sheltered combes can they be kept.
The much-relished Swiss honey is a manufactured product of
glycerine and pear-juice; but Dartmoor honey is the sublimated
essence of ambrosial sweetness in taste and savour, drawn from no
other source than the chalices of the golden furze, and compounded
with no adventitious matter.
FOOTNOTES:
[1] "Dartmoor," in the Transactions of the Plymouth Institution,
1897-8.
Communication And Power Engineering R Rajesh Editor B Mathivanan Editor
S
CHAPTER II.
TORS
Dartmoor from a distance—Elevation—The tors—Old lake-beds
—"Clitters"—The boldest tors—Luminous moss—The
whortleberry—Composition of granite—Wolfram—The "forest"
and its surrounding commons—Venville parishes—
Encroachment of culture on the moor—The four quarters—A
drift—Attempts to reclaim the moor—Flint finds—The inclosing
of commons.
een from a distance, as for instance from Winkleigh churchyard, or from
Exbourne, Dartmoor presents a stately appearance, as a ridge of blue
mountains rising boldly against the sky out of rolling, richly wooded under-
land.
But it is only from the north and north-west that it shows so well. From south
and east it has less dignity of aspect, as the middle distance is made up of
hills, as also because the heights of the encircling tors are not so
considerable, nor is their outline so bold.
Indeed, the southern edge of Dartmoor is conspicuously tame. It has no
abrupt and rugged heights, no chasms cleft and yawning in the range, such
as those of the Okement and the Tavy and Taw. And to the east much high
ground is found rising in stages to the fringe of the heather-clothed tors.
A TOR, SHOWING WEATHERING OF GRANITE
Dartmoor, consisting mainly of a great upheaved mass of granite, and of a
margin of strata that have been tilted up round it, forms an elevated region
some thirty-two miles from north to south and twenty from east to west. The
heated granite has altered the slates in contact with it, and is itself broken
through on the west side by an upward gush of molten matter which has
formed Whit Tor and Brent Tor.
The greatest elevations are reached on the outskirts, and there, also, is the
finest scenery. The interior consists of rolling upland. It has been likened to a
sea after a storm suddenly arrested and turned to stone; but a still better
resemblance, if not so romantic, is that of a dust-sheet thrown over the
dining-room chairs, the backs of which resemble the tors divided from one
another by easy sweeps of turf.
Most of the heights are crowned with masses of rock standing up like old
castles; these, and these only, are tors.[2] Such are the worn-down stumps of
vast masses of mountain formation that have disappeared. There are no lakes
on or about the moor, but this was not always so. Where is now Bovey
Heathfield was once a noble sheet of water fifty fathoms deep. Here have
been found beds of lignite, forests that have been overwhelmed by the wash
from the moor, a canoe rudely hollowed out of an oak, and a curious wooden
idol was exhumed leaning against a trunk of tree that had been swallowed up
in a freshet. The canoe was nine feet long. Bronze spear-heads have also
been found in this ancient lake, and moulds for casting bronze instruments. A
representation of the idol was given in the Transactions of the Devonshire
Association for 1875.
The new Plymouth Reservoir overlies an old lake-bed. Taw Marsh was also
once a sheet of rippling blue water, but the detritus brought down in the
weathering of what once were real mountains has filled them all up. Dartmoor
at present bears the same relation to Dartmoor in the far past that the gums
of an old hag bear to the pearly range she wore when a fresh girl. The granite
of Dartmoor was not well stirred before it was turned out, consequently it is
not homogeneous. Granite is made up of many materials: hornblende,
feldspar, quartz, mica, schorl, etc. Sometimes we find white mica, sometimes
black. Some granite is red, as at Trowlesworthy, and the beautiful band that
crosses the Tavy at the Cleave; sometimes pink, as at Leather Tor; sometimes
greenish, as above Okery Bridge; sometimes pure white, as at Mill Tor.
The granite is of very various consistency, and this has given it an appearance
on the tors as if it were a sedimentary rock laid in beds. But this is its little
joke to impose on the ignorant. The feature is due to the unequal hardness of
the rock which causes it to weather in strata.
The fine-grained granite that occurs in dykes is called elvan, which, if easiest
to work, is most liable to decay. In Cornwall the elvan of Pentewan was used
for the fine church of S. Austell, and as a consequence the weather has
gnawed it away, and the greater part has had to be renewed. On the other
hand, the splendid elvan of Haute Vienne has supplied the cathedral of
Limoges with a fine-grained material that has been carved like lace, and lasts
well.
The drift that swept over the land would appear to have been from west to
east, with a trend to the south, as no granite has been transported, except in
the river-beds to the north or west, whereas blocks have been conveyed
eastward. This is in accordance with what is shown by the long ridges of clay
on the west of Dartmoor, formed of the rubbing down of the slaty rocks that
lie north and north-west. These bands all run north and south on the sides of
hills, and in draining processes they have to be pierced from east to west.
This indicates that at some period during the Glacial Age there was a wash of
water from the north-west over Devon, depositing clay and transporting
granite.
On the sides of the tors are what are locally termed "clitters" or "clatters"
(Welsh clechr), consisting of a vast quantity of stone strewn in streams from
the tors, spreading out fanlike on the slopes. These are the wreckage of the
tor when far higher than it is now, i.e. of the harder portions that have not
been dissolved and swept away.
"The tors—Nature's towers—are huge masses of granite on the
top of the hills, which are not high enough to be called mountains,
piled one upon another in Nature's own fantastic way. There may
be a tor, or a group of tors, crowning an eminence, but the effect,
either near or afar, is to give the hilltop a grand and imposing
look. These large blocks of granite, poised on one another, some
appearing as if they must fall, others piled with curious regularity
—considering they are Nature's work—are the prominent features
in a Dartmoor landscape, and, wild as parts of Dartmoor are, the
tors add a notable picturesque effect to the scene. There are very
fine tors on the western side of the moor. Those on the east and
south are not so fine as those on the north and west. In the
centre of the moor there are also fine tors. They are, in fact, very
numerous, for nearly every little hill has its granite cap, which is a
tor, and every tor has its name. Some of the high hills that are tor-
less are called beacons, and were doubtless used as signal
beacons in times gone by. As the tors are not grouped or built
with any design by Nature to attract the eye of man, they are the
more attractive on that account, and one of their consequent
peculiarities is that from different points of view they never appear
the same. There can be no sameness in a landscape of tors when
every tor changes its features according to the point of view from
which you look at it. Every tor also has its heap of rock at its feet,
some of them very striking jumbles of blocks of granite scattered
in great confusion between the tor and the foot of the hill. Fur Tor,
which is in the very wildest spot on Dartmoor, and is one of the
leading tors, has a clitter of rocks on its western side as
remarkable as the tor itself; Mis Tor, also on its western side, has
a very fine clitter of granite; Leather Tor stands on the top of a
mass of granite rocks on its east and south sides; and Hen Tor, on
the south quarter, is surrounded with blocks of granite, with a
hollow like the crater of a volcano, as if they had been thrown up
by a great convulsion of Nature. Hen Tor is remarkable chiefly for
this wonderful mass of granite blocks strewn around it. All the
moor has granite boulders scattered about, but they accumulate
at the feet of the tors as if for their support."[3]
VIXEN TOR
Here among the clitters, where they form caves, a search may be made for
the beautiful moss Schistostega osmundacea. It has a metallic lustre like
green gold, and on entering a dark place under rocks, the ground seems to be
blazing with gold. In Germany the Fichtel Gebirge are of granite, and the
Luchsen Berg is so called because there in the hollow under the rocks grew
abundance of the moss glittering like the eyes of a lynx. The authorities of
Alexanderbad have had to rail in the grottoes to prevent the gold moss from
being carried off by the curious. Murray says of these retreats of the luminous
moss:—
"The wonder of the place is the beautiful phosphorescence which
is seen in the crannies of the rocks, and which appears and
disappears according to the position of the spectator. This it is
which has given rise to the fairy tales of gold and gems with which
the gnomes and cobolds tantalise the poor peasants. The light
resembles that of glow-worms; or, if compared to a precious
stone, it is something between a chrysolite and a cat's-eye, but
shining with a more metallic lustre. On picking up some of it, and
bringing it to the light, nothing is found but dirt."
Professor Lloyd found that the luminous appearance was due to the presence
of small crystals in the structure which reflect the light. Coleridge says:—
"'Tis said in Summer's evening hour,
Flashes the golden-coloured flower,
A fair electric light."
In 1843, when the luminosity of plants was recorded in the Proceedings of the
British Association, Mr. Babington mentioned having seen in the south of
England a peculiar bright appearance produced by the presence of the
Schistostega pennata, a little moss which inhabited caverns and dark places:
but this was objected to on the ground that the plant reflected light, and did
not give it off in phosphorescence.[4]
When lighted on, it has the appearance of a handful of emeralds or aqua
marine thrown into a dark hole, and is frequently associated with the bright
green liverwort. Parfitt, in his Moss Flora of Devon, gives it as osmundacea,
not as pennata. It was first discovered in Britain by a Mr. Newberry, on the
road from Zeal to South Tawton; it is, however, to be found in a good many
places, as Hound Tor, Widdecombe, Leather Tor, and in the Swincombe valley,
also in a cave under Lynx Tor. If found, please to leave alone. Gathered it is
invisible; the hand or knife brings away only mud.
But what all are welcome to go after is that which is abundant on every
moorside—but nowhere finer than on such as have not been subjected to
periodical "swaling" or burning. I refer to the whortleberry. This delicious fruit,
eaten with Devonshire cream, is indeed a delicacy. A gentleman from London
was visiting me one day. As he was fond of good things, I gave him
whortleberry and cream. He ate it in dead silence, then leaned back in his
chair, looked at me with eyes full of feeling, and said, "I am thankful that I
have lived to this day."
The whortleberry is a good deal used in the south of France for the
adulteration and colouring of claret, whole truck-loads being imported from
Germany.
There is an interesting usage in my parish, and I presume the same exists in
others. On one day in summer, when the "whorts" are ripe, the mothers unite
to hire waggons of the farmers, or borrow them, and go forth with their little
ones to the moor. They spend the day gathering the berries, and light their
fires, form their camp, and have their meals together, returning late in the
evening, very sunburnt, with very purple mouths, very tired maybe, but vastly
happy, and with sufficient fruit to sell to pay all expenses and leave something
over.
If the reader would know what minerals are found on Dartmoor he must go
elsewhere.
I have a list before me that begins thus: "Allophane, actinolite, achroite,
andalusite, apatite"—but I can copy out no more. I have often found appetite
on Dartmoor, but have not the slightest suspicion as to what is apatite. The
list winds up with wolfram, about which I can say something. Wolfram is a
mineral very generally found along with tin, and that is just the "cussedness"
of it, for it spoils tin.
When tin ore is melted at a good peat fire, out runs a silver streak of metal.
This is brittle as glass, because of the wolfram in it. To get rid of the wolfram
the whole has to be roasted, and the operation is delicate, and must have
bothered our forefathers considerably. By means of this second process the
wolfram, or tungsten as it is also called, is got rid of.
Now, it is a curious fact that the tin of Dartmoor is of extraordinary purity; it
has little or none of this abominable wolfram associated with it, so that it is by
no means improbable that the value of tin as a metal was discovered on
Dartmoor, or in some as yet unknown region where it is equally unalloyed.
In Cornwall all the tin is mixed with tungsten. Now this material has been
hitherto regarded as worthless; it has been sworn at by successive
generations of miners since mining first began. But all at once it has leaped
into importance, for it has been discovered to possess a remarkable property
of hardening iron, and is now largely employed for armour-plated vessels.
From being worth nothing it has risen to a rapidly rising value, as we are
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Emerging Technologies in Engineering Mahesh P. K.
PDF
Emerging Technologies in Engineering Mahesh P. K.
PDF
Emerging Technologies in Engineering 1st edition - eBook PDF
PDF
Emerging Technologies in Engineering 1st edition - eBook PDF
PPTX
IICET 2015 - i-manager's International Conference on Engineering and Technolo...
PDF
Computational Advancement in Communication Circuits and Systems Proceedings o...
PDF
Research paper collection by vitul chauhan.pdf
PDF
Evolving Technologies For Computing Communication And Smart World Proceedings...
Emerging Technologies in Engineering Mahesh P. K.
Emerging Technologies in Engineering Mahesh P. K.
Emerging Technologies in Engineering 1st edition - eBook PDF
Emerging Technologies in Engineering 1st edition - eBook PDF
IICET 2015 - i-manager's International Conference on Engineering and Technolo...
Computational Advancement in Communication Circuits and Systems Proceedings o...
Research paper collection by vitul chauhan.pdf
Evolving Technologies For Computing Communication And Smart World Proceedings...

Similar to Communication And Power Engineering R Rajesh Editor B Mathivanan Editor (20)

PDF
Proceedings Of Fifth International Conference On Computer And Communication T...
PDF
ICCIEE 2023 Proceeding.pdf
PDF
Proceedings Of International Conference On Communication Circuits And Systems...
PDF
Computational Advancement In Communication Circuits And Systems Proceedings O...
PDF
Advances In Computer Science And Engineering Matthias Schmidt
DOC
Editorial member Dr N V Srinivasulu
PDF
Conversational Dialogue Systems For The Next Decade 704 1st Ed 2021 Luis Fern...
PDF
StemConferences Batch A 2019
PDF
International Conference On Iot Intelligent Computing And Security Select Pro...
PDF
Proceedings of the International Conference on Paradigms of Communication, Co...
PDF
International Virtual Conference on Industry 4 0 Select Proceedings of IVCI4 ...
PDF
Information and Communication Technologies International Conference ICT 2010 ...
PDF
IEEE International Conference on Recent Advances in Energy-efficient Computi...
PDF
Stem Conferences 2018 Batch A
PDF
Intelligent Systems And Applications Select Proceedings Of Icisa 2022 Anand J...
PDF
ICIECA 2013 brochure & Call for Paper
PDF
International Journal of Computer Networks & Communications (IJCNC)
PDF
Advances in Signal Processing and Communication Engineering: Select Proceedin...
Proceedings Of Fifth International Conference On Computer And Communication T...
ICCIEE 2023 Proceeding.pdf
Proceedings Of International Conference On Communication Circuits And Systems...
Computational Advancement In Communication Circuits And Systems Proceedings O...
Advances In Computer Science And Engineering Matthias Schmidt
Editorial member Dr N V Srinivasulu
Conversational Dialogue Systems For The Next Decade 704 1st Ed 2021 Luis Fern...
StemConferences Batch A 2019
International Conference On Iot Intelligent Computing And Security Select Pro...
Proceedings of the International Conference on Paradigms of Communication, Co...
International Virtual Conference on Industry 4 0 Select Proceedings of IVCI4 ...
Information and Communication Technologies International Conference ICT 2010 ...
IEEE International Conference on Recent Advances in Energy-efficient Computi...
Stem Conferences 2018 Batch A
Intelligent Systems And Applications Select Proceedings Of Icisa 2022 Anand J...
ICIECA 2013 brochure & Call for Paper
International Journal of Computer Networks & Communications (IJCNC)
Advances in Signal Processing and Communication Engineering: Select Proceedin...
Ad

Recently uploaded (20)

PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Computing-Curriculum for Schools in Ghana
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Insiders guide to clinical Medicine.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Pharma ospi slides which help in ospi learning
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
01-Introduction-to-Information-Management.pdf
Basic Mud Logging Guide for educational purpose
Pharmacology of Heart Failure /Pharmacotherapy of CHF
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Computing-Curriculum for Schools in Ghana
2.FourierTransform-ShortQuestionswithAnswers.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Insiders guide to clinical Medicine.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
VCE English Exam - Section C Student Revision Booklet
human mycosis Human fungal infections are called human mycosis..pptx
Microbial disease of the cardiovascular and lymphatic systems
Pharma ospi slides which help in ospi learning
STATICS OF THE RIGID BODIES Hibbelers.pdf
Classroom Observation Tools for Teachers
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Renaissance Architecture: A Journey from Faith to Humanism
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
O7-L3 Supply Chain Operations - ICLT Program
Ad

Communication And Power Engineering R Rajesh Editor B Mathivanan Editor

  • 1. Communication And Power Engineering R Rajesh Editor B Mathivanan Editor download https://ebookbell.com/product/communication-and-power- engineering-r-rajesh-editor-b-mathivanan-editor-51110820 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Mobile Communication And Power Engineering Second International Joint Conference Aimccpe 2012 Bangalore India April 2728 2012 Revised Selected Papers 1st Edition Thiruppathy Kesavan V https://ebookbell.com/product/mobile-communication-and-power- engineering-second-international-joint-conference- aimccpe-2012-bangalore-india-april-2728-2012-revised-selected- papers-1st-edition-thiruppathy-kesavan-v-4522856 Power Line Communication Systems For Smart Grids Ivan Rs Casella https://ebookbell.com/product/power-line-communication-systems-for- smart-grids-ivan-rs-casella-22035188 Caste Communication And Power Biswajit Das Debendra Prasad Majhi https://ebookbell.com/product/caste-communication-and-power-biswajit- das-debendra-prasad-majhi-33791514 Mutative Media Communication Technologies And Power Relations In The Past Present And Futures 1st Edition James A Dator https://ebookbell.com/product/mutative-media-communication- technologies-and-power-relations-in-the-past-present-and-futures-1st- edition-james-a-dator-4931380
  • 3. The International Political Economy Of Communication Media And Power In South America Cheryl Martens https://ebookbell.com/product/the-international-political-economy-of- communication-media-and-power-in-south-america-cheryl-martens-5380564 Media Power And Empowerment Central And Eastern European Communication And Media Conference Ceecom Prague 2012 1st Edition Tereza Pavlickova https://ebookbell.com/product/media-power-and-empowerment-central-and- eastern-european-communication-and-media-conference-ceecom- prague-2012-1st-edition-tereza-pavlickova-5767294 Paul And The Dynamics Of Power Communication And Interaction In The Early Christmovement Kathy Ehrensperger https://ebookbell.com/product/paul-and-the-dynamics-of-power- communication-and-interaction-in-the-early-christmovement-kathy- ehrensperger-50679474 Paul And The Dynamics Of Power Communication And Interaction In The Early Christmovement Library Of New Testament Studies Kathy Ehrensperger https://ebookbell.com/product/paul-and-the-dynamics-of-power- communication-and-interaction-in-the-early-christmovement-library-of- new-testament-studies-kathy-ehrensperger-2493112 Strategic Narratives Communication Power And The New World Order Alister Miskimmon https://ebookbell.com/product/strategic-narratives-communication- power-and-the-new-world-order-alister-miskimmon-7120144
  • 5. R. Rajesh, B. Mathivanan (Eds.) Communication and Power Engineering
  • 8. Editors Dr. R. Rajesh Central University of Kerala India kollamrajeshr@gmail.com Dr. B. Mathivanan Sri Ramakrishna Engg. College India mathivanan.bala@srec.ac.in ISBN 978-3-11-046860-1 e-ISBN (PDF) 978-3-11-046960-8 Set-ISBN 978-3-11-046961-5 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2016 Walter de Gruyter GmbH, Berlin/Boston Printing and binding: CPI books GmbH, Leck cover image: Thinkstock/tStockbyte ♾ Printed on acid-free paper Printed in Germany www.degruyter.com
  • 9. Committees Honorary Chair Dr. Shuvra Das (University of Detroit Mercy, USA) Dr. Jiguo Yu (Qufu Normal University, China) Technical Chair Dr. Sumeet Dua (Louisiana Tech University, USA) Dr. Amit Banerjee (The Pennsylvania State University, USA) Dr. Narayan C Debnath (Winona State University, USA) Dr. Xiaodi Li (Shandong Normal University, China) Technical Co-Chair Dr. Natarajan Meghanathan (Jackson State University, USA) Dr. Hicham Elzabadani (American University in Dubai) Dr. Shahrokh Valaee (University of Toronto, Canada) Chief Editors Dr. Rajesh R (Central University of Kerala, India) Dr. B Mathivanan (Sri Ramakrishna Engg. College, India) General Chair Dr. Janahanlal Stephen (Matha College of Technology, India) Dr. Yogesh Chaba (Guru Jambeswara University, India) General Co-Chair Prof. K. U Abraham (Holykings College of Engineering, India) Publicity Chair Dr. Amit Manocha (Maharaja Agrasen Institute of Technology, India) Finanace Chair Dr. Gylson Thomas (Jyothi Engineering College, India) Dr. Ilias Maglogiannis (University of Central Greece) Publicity Co-Chair Prof. Ford Lumban Gaol (University of Indonesia) Dr. Amlan Chakrabarti (University of Culcutta, India) Prof. Prafulla Kumar Behera, PhD(Utkal University, India)
  • 10. vi | Committees Publication Chair Dr. Vijayakumar (NSS Engg. College, India) Dr. T.S.B.Sudarshan (BITS Pilani, India) Dr. KP Soman (Amritha University, India) Prof. N.Jaisankar (VIT University, India) Dr. Rajiv Pandey (Amity University, India) Program Committee Chair Dr. Harry E. Ruda (University of Toronto, Canada) Dr Deepak Laxmi Narasimha (University of Malaya, Malaysia) Dr.N.Nagarajan (Anna University, Coimbatore, India) Prof. Akash Rajak (Krishna Institute of Engg. & Tech., UP, India) Prof. M Ayoub Khan (CDAC, NOIDA, India) Programming Committee Prof. Shelly Sachdeva (Jaypee Institute of Information & Technology University, India) Prof. PRADHEEP KUMAR K (SEEE, India) Mrs. Rupa Ashutosh Fadnavis (Yeshwantrao Chavan College of Engineering, India) Dr. Shu-Ching Chen (Florida International University, USA) Dr. Stefan Wagner (Fakultät für Informatik Technische Universität München, Boltzmannstr) Prof. Juha Puustjärvi (Helsinki University of Technology) Dr. Selwyn Piramuthu (University of Florida) Dr. Werner Retschitzegger (University of Linz, Austria) Dr. Habibollah Haro (Universiti Teknologi Malaysia) Dr. Derek Molloy (Dublin City University, Ireland) Dr. Anirban Mukhopadhyay (University of Kalyani, India) Dr. Malabika Basu (Dublin Institute of Technology, Ireland) Dr. Tahseen Al-Doori (American University in Dubai) Dr. V. K. Bhat (SMVD University, India) Dr. Ranjit Abraham (Armia Systems, India) Dr. Naomie Salim (Universiti Teknologi Malaysia) Dr. Abdullah Ibrahim (Universiti Malaysia Pahang) Dr. Charles McCorkell (Dublin City University, Ireland) Dr. Neeraj Nehra (SMVD University, India)
  • 11. Committees | vii Dr. Muhammad Nubli (Universiti Malaysia Pahang) Dr. Zhenyu Y Angz (Florida International University, USA) Dr. Keivan Navi (Shahid Beheshti University,
  • 13. Preface It is my proud privilege to welcome you all to the joint International Conferences organized by IDES. This conference is jointly organized by the IDES and the As- sociation of Computer Electrical Electronics and Communication Engineers (ACEECom). The primary objective of this conference is to promote research and developmental activities in Computer Science, Electrical, Electronics, Network, Computational Engineering, and Communication. Another objective is to pro- mote scientific information interchange between researchers, developers, engi- neers, students, and practitioners working in India and abroad. I am very excited to see the research papers from various parts of the world. This proceeding brings out the various Research Papers from diverse areas of Computer Science, Electrical, Electronics, Network, Computational Engineering, and Communication. This conference is intended to provide a common platform for Researchers, Academicians and Professionals to present their ideas and inno- vative practices and to explore future trends and applications in the field of Sci- ence and Engineering. This conference also provides a forum for dissemination of Experts’ domain knowledge. The papers included in the proceedings are peer- reviewed scientific and practitioners’ papers, reflecting the variety of Advances in Communication, Network, Electrical, Electronics, and Computing. As a Chief Editor of this joint Conference proceeding, I would like to thank all of the presenters who made this conference so interesting and enjoyable. Special thanks should also be extended to the session chairs and the reviewers who gave of their time to evaluate the record number of submissions. To all of the members of various Committees, I owe a great debt as this conference would not have not have been possible without their constant efforts. We hope that all of you reading enjoy these selections as much as we enjoyed the conference. Dr. B Mathivanan Sri Ramakrishna Engineering College, India
  • 15. Table of Contents Foreword | xv Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri Handwritten Script Identification from Text Lines | 1 Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri A Rule based Approach for Noun Phrase Extraction from English Text Document | 13 Jaya Gera and Harmeet Kaur Recommending Investors using Association Rule Mining for Crowd Funding Projects | 27 P. S. Hiremath and Rohini A. Bhusnurmath Colour Texture Classification Using Anisotropic Diffusion and Wavelet Transform | 44 I.Thamarai and S. Murugavalli Competitive Advantage of using Differential Evolution Algorithm for Software Effort Estimation | 62 Shilpa Gopal and Dr. Padmavathi.S Comparative Analysis of Cepstral analysis and Autocorrelation Method for Gender Classification | 76 P Ravinder Kuma, Dr Sandeep.V.M and Dr Subhash S Kulkarni A Simulative Study on Effects of Sensing Parameters on Cognitive Radio’s Performance | 90 Priyanka Parida, Tejaswini P. Deshmukh, and Prashant Deshmukh Analysis of Cyclotomic Fast Fourier Transform by Gate level Delay Method | 104 Liji P I and Bose S Dynamic Resource Allocation in Next Generation Networks using FARIMA Time Series Model | 112
  • 16. xii | Table of Contents Ms Shanti Swamy, Dr.S.M.Asutkar and Dr.G.M.Asutkar Classification of Mimetite Spectral Signatures using Orthogonal Subspace Projection with Complex Wavelet Filter Bank based Dimensionality Reduction | 126 Sharmila Kumari M, Swathi Salian and Sunil Kumar B. L An Illumination Invariant Face Recognition Approach based on Fourier Spectrum | 132 Arlene Davidson R and S. Ushakumari Optimal Load Frequency Controller for a Deregulated Reheat Thermal Power System | 144 Chandana B R and A M Khan Design and Implementation of a Heuristic Approximation Algorithm for Multicast Routing in Optical Networks | 155 Sneha Sharma and P Beaulah Soundarabai Infrastructure Management Services Toolkit | 167 Divyansh Goel, Agam Agarwal and Rohit Rastogi A Novel Approach for Residential Society Maintenance Problem for Better Human Life | 177 H. Kavitha, Montu Singh, Samrat Kumar Rai and Shakeelpatel Biradar Smart Suspect Vehicle Surveillance System | 186 Takahito Kimura and Shin-Ya Nishizaki Formal Performance Analysis of Web Servers using an SMT Solver and a Web Framework| 195 Jisha P Abraham and Dr.Sheena Mathew Modified GCC Compiler Pass for Thread-Level Speculation by Modifying the Window Size using Openmp | 205 Liu and Baiocchi Overview and Evaluation of an IoT Product for Application Development | 213
  • 17. Table of Contents | xiii A.Senthamaraiselvan and Ka.Selvaradjou A TCP in CR-MANET with Unstable Bandwidth | 224 Morande Swapnil and Tewari Veena Impact of Digital Ecosystem on Business Environment | 233 Narayan Murthy A Two-Factor Single Use Password Scheme | 242 Dr.Ramesh k Design & Implementation of Wireless System for Cochlear Devices | 248 Gurunadha Rao Goda and Dr. Avula Damodaram Software Code Clone Detection and Removal using Program Dependence Graphs | 256 Dileep Kumar G., Dr. Vuda Sreenivasa Rao, Getinet Yilma and Mohammed Kemal Ahmed Social Sentimental Analytics using Big Data Tools | 266 J. Prakash and A. Bharathi Predicting Flight Delay using ANN with Multi-core Map Reduce Framework| 280 Dr.Ramesh K, Dr. Sanjeevkumar K.M and Sheetalrani Kawale New Network Overlay Solution for Complete Networking Virtualization | 288 Konda.Hari Krishna, Dr.Tapus Kumar, Dr.Y.Suresh Babu, N.Sainath and R.Madana Mohana Review upon Distributed Facts Hard Drive Schemes throughout Wireless Sensor Communities | 297 Mohd Maroof Siddiqui, Dr. Geetika Srivastava, Prof (Dr) Syed Hasan Saeed and Shaguftah Detection of Rapid Eye Movement Behaviour Sleep Disorder using Time and Frequency Analysis of EEG Signal Applied on C4-A1 Channel | 310
  • 18. xiv | Table of Contents Komal Sunil Deokar and Rajesh Holmukhe Analysis of PV/ WIND/ FUEL CELL Hybrid System Interconnected With Electrical Utility Grid | 327 Lipika Nanda and Pratap Bhanu Mishra Analysis of Wind Speed Prediction Technique by hybrid Weibull-ANN Model | 337 K.Navatha, Dr. J.Tarun Kumar and Pratik Ganguly An efficient FPGA Implementation of DES and Triple-DES Encryption Systems | 348 Sunil Kumar Jilledi and Shalini J A Novelty Comparison of Power with Assorted Parameters of a Horizontal Wind Axis Turbine for NACA 5512 | 357 Naghma Khatoon and Amritanjali Retaliation based Enhanced Weighted Clustering Algorithm for Mobile Ad-hoc Network (R-EWCA) | 365 Dr.K.Meenakshi Sundaram and Sufola Das Chagas Silva Araujo Chest CT Scans Screening of COPD based Fuzzy Rule Classifier Approach | 373 Author Index | 385
  • 19. Foreword The Institute of Doctors Engineers and Scientists (IDES) (with an objective to promote the Research and Development activities in the Science, Medical, Engi- neering and Management field) and the Association of Computer Electrical Elec- tronics and Communication (ACEECom) (with an objective to disseminate knowledge and to promote the research and development activities in the engineering and technology field) has both joined hands to hand together for the benefit of the society. For more than a decade, both IDES and ACEECom are well established in organizing conferences and publishing journals. This joint International Conference organized by IDES and ACEECom in 2016, aiming to bring together the Professors, Researchers, and Students in all areas of Computer Science, Information Technology, Computational Engineering, Communication, Signal Processing, Power Electronics, Image Processing, etc. in one platform, where they can interact and share the ideas. A total of 35 eminent scholars/speakers have registered their papers in areas of Computer Science and Electrical & Electronics discipline. These papers are published in a proceedings by De Gruyter Digital Library and are definitely go- ing to be the eye-opening to the world for further research in this area. The organizations (IDES and ACEECom) will again come together in front of you in future for further exposure of the unending research. Dr. Rajesh R Central University of Kerala, India
  • 21. Pawan Kumar Singh1 , Iman Chatterjee2 , Ram Sarkar3 and Mita Nasipuri4 Handwritten Script Identification from Text Lines Abstract: In a multilingual country like India where 12 different official scripts are in use, automatic identification of handwritten script facilitates many im- portant applications such as automatic transcription of multilingual docu- ments, searching for documents on the web/digital archives containing a par- ticular script and for the selection of script specific Optical Character Recognition (OCR) system in a multilingual environment. In this paper, we pro- pose a robust method towards identifying scripts from the handwritten docu- ments at text line-level. The recognition is based upon features extracted using Chain Code Histogram (CCH) and Discrete Fourier Transform (DFT). The pro- posed method is experimented on 800 handwritten text lines written in seven Indic scripts namely, Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Urdu along with Roman script and yielded an average identification rate of 95.14% using Support Vector Machine (SVM) classifier. Keywords: Script Identification, Handwritten text lines, Indic scripts, Chain Code Histogram, Discrete Fourier Transform, Multiple Classifiers 1 Introduction One of the major Document Image Analysis research thrusts is the implementa- tion of OCR algorithms that are able to make the alphanumeric characters pre- sent in a digitized document into a machine readable form. Examples of the applications of such research include automated word recognition, bank check || 1 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India pawansingh.ju@gmail.com 2 Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India imanchatterjee9@gmail.com 3 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India raamsarkar@gmail.com 4 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India mitanasipuri@gmail.com
  • 22. 2 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri processing, and address sorting in postal applications etc. Consequently, the vast majority of the OCR algorithms used in these applications are selected based upon a priori knowledge of the script and/or language of the document under analysis. This assumption requires human intervention to select the ap- propriate OCR algorithm, limiting the possibility of completely automating the analysis process, especially when the environment is purely multilingual. In this scenario, it is very necessary to have the script recognition module before applying such document into appropriate OCR system. In general, script identification can be achieved at any of the three levels: (a) Page-level, (b) Text-line level and (c) Word-level. In comparison to page or word-level, script recognition at the text line-level in a multi-script document may be much more challenging but it has its own advantages. To reliably identi- fy the script type, one needs a certain amount of textual data. But identifying text words of different scripts with only a few numbers of characters may not always be feasible because at word-level, the number of characters present in a single word may not be always informative. In addition, performing script iden- tification at word-level also requires the exact segmentation of text words which is again an exigent task. On the contrary, identifying scripts at page-level can be sometimes too convoluted and protracted. So, it would be better to perform the script identification at text line-level than its two counterparts. A detailed state-of-the-art on Indic script identification described by P. K. Singh et al. [1] shows that most of the reported studies [2-8], accomplishing script identification at text line-level, work for printed text documents. G. D. Joshi et al. [2] proposed a hierarchical script classifier which uses a two-level, tree based scheme for identifying 10 printed Indic scripts namely, Bangla, Deva- nagari, Gujarati, Gurumukhi, Kannada, Malayalam, Oriya, Tamil and Urdu in- cluding Roman script. A total of 3 feature set such as, statistical, local, horizon- tal profile are extracted from the normalized energy of log-Gabor filters designed at 8 equi-spaced orientations (0 , 22.5 , 45 , 77.5 , 90 , 112.5 , 135.5 and 180 ) and at an empirically determined optimal scale. An overall classification accuracy of 97.11% is obtained. M. C. Padma et al. [3] proposed to develop a model based on top and bottom profile based features to identify and separate text lines of Telugu, Devnagari and English scripts from a printed tri- lingual document. A set of eight features (i.e. bottom max-row, top-horizontal- line, tick-component, bottom component (extracted from the bottom-portion of the input text line), top-pipe-size, bottom-pipe-size, top-pipe-density, bottom- pipe-density) are experimentally computed and the overall accuracy of the sys- tem is found to be 99.67%. M. C. Padma et al. [4] also proposed a model to iden- tify the script type of a trilingual document printed in Kannada, Hindi and Eng-
  • 23. Handwritten Script Identification from Text Lines | 3 lish scripts. The distinct characteristic features of said scripts are thoroughly studied from the nature of the top and bottom profiles. A set of 4 features name- ly, profile_value (computed as the density of the pixels present at top_max_row and bottom_max_row), bottom_max_row_no (the value of the attribute bot- tom_max_row), coeff_profile, top_component_density (the density of the con- nected components at the top_max_row) are computed. Finally, k-NN (k-Nearest Neighbor) classifier is used to classify the test samples with an average recogni- tion rate of 99.5%. R. Gopakumar et al. [5] described a zone-based structural feature extraction algorithm for the recognition of South-Indic scripts (Kannada, Telugu, Tamil and Malayalam) along with English and Hindi. A set of 9 features such as number of horizontal lines, vertical lines, right diagonals, left diago- nals, normalized lengths of horizontal lines, vertical lines, right diagonals, left diagonals and normalized area of the line image are computed for each text line image. Finally, the classification accuracies of 100% and 98.3% are achieved using k-NN and SVM (Support Vector Machine) respectively. M. Jindal et al. [6] proposed a script identification approach for Indic scripts at text line-level based upon features extracted using Discrete Cosine Transform (DCT) and Prin- cipal Component Analysis (PCA) algorithm. The proposed method is tested on printed document images in 11 major Indian languages (viz., Bangla, Hindi, Gujarati, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu, English and Urdu) and 95% recognition accuracy is obtained. R. Rani et al. [7] presented the effec- tiveness of Gabor filter banks using k-NN, SVM and PNN (Probabilistic Neural Network) classifiers to identify the scripts at text-line level from trilingual doc- uments printed in Gurumukhi, Hindi and English. The experiment shows that a set of 140 features based on Gabor filter with SVM classifier achieve the maxi- mum recognition rate of 99.85%. I. Kaur et al. [8] presented a script identifica- tion work for the identification of English and Punjabi scripts at text-line level through headline and characters density features. The approach is thoroughly tested for different font size images and an average accuracy of 90.75% is achieved. On the contrary, researches made on handwritten documents are only a few in number. M. Hangarge et al. [9] investigated texture pattern as a tool for determining the script of handwritten document image, based on the observa- tion that text has a distinct visual texture. A set of 13 spatial spread features of the three Indic scripts namely, English, Devanagari and Urdu are extracted using morphological filters and the overall accuracies of the proposed algorithm are found to be 88.67% and 99.2% for tri-script and bi-script classifications respec- tively using k-NN classifier. P. K. Singh et al. [10] proposed a texture based ap- proach for text line-level script identification of six handwritten scripts namely, Bangla, Devanagari, Malayalam, Tamil, Telugu and Roman. A set of 80 features
  • 24. 4 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri based on Gray Level Co-occurrence Matrix (GLCM) is used and an overall recog- nition rate of 95.67% is achieved using Multi Layer Perceptron (MLP) classifier. To the best of our knowledge, script identification at text line-level considering large number of Indic handwritten scripts does not exist in the literature. In this paper, we propose a text line-level script identification technique written in seven popular official Indic scripts namely, Gujarati, Kannada, Malayalam, Ori- ya, Tamil, Telugu, Urdu along with Roman script. 2 Data Collection and Preprocessing At present, no standard database of handwritten Indic scripts are available in public domain. Hence, we created our own database of handwritten documents in the laboratory. The document pages for the database are collected by differ- ent persons on request under our supervision. The writers are asked to write inside A-4 size pages, without imposing any constraint regarding the content of the textual materials. The document pages are digitized at 300 dpi resolution and stored as gray tone images. The scanned images may contain noisy pixels which are removed by applying Gaussian filter [11]. It should be noted that the handwritten text line (actually, portion of the line arbitrarily chosen) may con- tain two or more words with noticeable intra- and inter-word spacings. Numer- als that may appear in the text are not considered for the present work. It is ensured that at least 50% of the cropped text line contains text. A sample snap- shot of text line images written in eight different scripts is shown in Fig. 1. Otsu’s global thresholding approach [12] is used to convert them into two-tone images. However, the dots and punctuation marks appearing in the text lines are not eliminated, since these may also contribute to the features of respective scripts. Finally, a total of 800 handwritten text line images are considered, with exactly100 text lines per script. 3 Feature Extraction The feature extraction is based on the combination of Chain Code Histogram (CCH) and Discrete Fourier Transform (DFT) which are described in detail in the next subsection.
  • 25. Handwritten Script Identification from Text Lines | 5 Figure 1. Sample text line images taken from our database written in: (a) Gujarati, (b) Kannada, (c) Malayalam, (d) Oriya, (e) Tamil, (f) Telugu, (g) Urdu, and (h) Roman scripts respectively 3.1 Chain Code Histogram Chain codes [11] are used to represent a boundary by a connected sequence of straight-line segments of specified length and direction. It describes the move- ment along a digital curve or a sequence based on the connectivity. Two types of chain codes are possible which are based on the numbers of neighbors of a pixel, namely, four or eight, giving rise to 4- or 8-neighbourhood. The corre- sponding codes are the 4-directional code and 8-directional code, respectively. The direction of each segment is coded by using a numbering scheme as shown in Fig. 2. In the present work, the boundaries of handwritten text lines written in different scripts can be traced and allotted the respective numbers based on the directions. Thus, the boundary of each of the text line is reduced to a sequence of numbers. A boundary code formed as a sequence of such directional numbers is referred to as a Freeman chain code.
  • 26. 6 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri Figure 2. Illustration of numbering the directions for: (a) 4-dimensional, and (b) 8-dimensional chain codes The histogram of Freeman chain codes are taken as feature values F1-F8 and the histogram of first difference of the chain codes are also taken as feature values F9-F15. Let us denote the set of pixels by R. The perimeter of a region R is the number of pixels present in the boundary of R. In a binary image, the perim- eter is the number of foreground pixels that touches the background in the im- age. For an 8-directional code, the length of perimeter of each text line (F16) is calculated as: |P| = Even count + √2 *(Odd count). A circularity measure (F17) proposed by Haralick [13] can be written as: = (1) where, and are the mean and standard deviation of the distance from the centroid of the shape to the shape boundary and can be computed as follows: = 1 ‖( , ) − ( , )‖ (2) = 1 ‖( , ) − ( , )‖ − (3) where, the set of pixels ( , ), = 0, … . . , − 1 lie on the perimeter P of the region. The circularity measure increases monotonically as the digital shape becomes more circular and is similar for digital and continuous shapes. Along the circularity, the slopes are labeled in accordance with their chain codes which are shown in Table 1.
  • 27. Handwritten Script Identification from Text Lines | 7 Table 1. Labeling of slope angles according to their chain codes Chain code 0 1 2 3 4 5 6 7 θ 0 450 900 1350 1800 -1350 -900 -450 The count of the slopes having θ values 00 , |450 |, |900 |, |1350 |, 1800 for each of the handwritten text line images are taken as feature values (F18-F22). 3.2 Discrete Fourier Transform The Fourier Transform [11] is an important image processing tool which is used to decompose an image into its sine and cosine components. The output of the transformation represents the image in the Fourier or frequency domain, while the input image is the spatial domain equivalent. In the Fourier domain, each point in the spatial domain image represents a particular frequency. The Discrete Fourier Transform (DFT) is the sampled Fourier Transform and therefore does not contain all frequencies forming an image, but only a set of samples which is large enough to fully describe the spatial domain image. The number of frequencies corresponds to the number of pixels in the spatial do- main image, i.e., the images in the spatial and Fourier domains are of the same size. The DFT of a digital image of size can be written as: ( , ) = 1 ( , ) (4) where, ( , ) is the image in the spatial domain and the exponential term is the basis function corresponding to each point ( , ) in the Fourier space. The value of each point ( , ) is obtained by multiplying the spatial image with the corresponding base function and summing the result. The Fourier Transform produces a complex number valued output which can be displayed with two images, either with the real and imaginary parts or with the magnitude and phase, where magnitude determines the contribution of each component and phase determines which components are present. The plots for magnitude and phase components for a sample Tamil handwritten text-line image are shown in Fig. 3. In the current work, only the magnitude part of DFT is em- ployed as it contains most of the information of the geometric structure of the spatial domain image. This in turn becomes easy to examine or process certain frequencies of the image. The magnitude coefficient is normalized as follows:
  • 28. 8 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri ( , ) = | ( , )| ∑ | ( , )| , (5) The algorithm for feature extraction using DFT is as follows: Step 1: Divide the input text line image into nxn non-overlapping blocks which are known as grids. The optimal value of has been chosen as 4. Step 2: Compute the DFT (by applying Eqn. (4)) in each of the grids. Step 3: Estimate only the magnitude part of the DFT and normalize it using Eqn. (5). Step 4: Calculate the mean and standard deviation of the magnitude part from each of the grids which give a feature vector of 32 elements (F23-F54). Figure 3. Illustration of: (a) handwritten Tamil text-line image, (b) its magnitude component, and (c) its phase component after applying DFT 4 Experimental Results and Discussion The performance of the present script identification scheme is evaluated on a dataset of 800 preprocessed text line images as described in Section 2. For each dataset of 100 text line images of a particular script, 65 images are used for training and the remaining 45 images are used for testing purpose. The pro- posed approach is evaluated by using seven well-known classifiers namely, Naïve Bayes, Bayes Net, MLP, SVM, Random Forest, Bagging and MultiClass Classifier. The recognition performances and their corresponding scores achieved at 95% confidence level are shown in Table 2.
  • 29. Handwritten Script Identification from Text Lines | 9 Table 2. Recognition performances of the proposed script identification technique using seven well-known classifiers (best case is shaded in grey and styled in bold) Classifiers Naïve Bayes Bayes Net MLP SVM Random Forest Bagging MultiClass Classifier Success Rate (%) 89.33 90.09 95.14 97.03 94.6 91.25 92.74 95% confidence score (%) 91.62 93.27 96.85 99.7 97.39 93.54 95.52 As observed from Table 2 that SVM classifier produces the highest identifi- cation accuracy of 97.03%. In the present work, detailed error analysis of SVM classifier with respect to different well-known parameters namely, Kappa statis- tics, mean absolute error, root mean square error, True Positive rate (TPR), False Positive rate (FPR), precision, recall, F-measure, Matthews Correlation Coeffi- cient (MCC) and Area Under ROC (AUC) are also computed. The values of Kappa statistics, mean absolute error, root mean square error of SVM classifier for the present technique are found to be 0.9661, 0.0074 and 0.0862 respectively. Table 3 provides a statistical performance analysis of the remaining parameters for each of the aforementioned scripts. Table 3. Statistical performance measures along with their respective means (shaded in grey and styled in bold) achieved by the proposed technique for eight handwritten scripts Scripts TP rate FP rate Precision Recall F-measure MCC AUC Gujarati 1.000 0.000 1.000 1.000 1.000 1.000 1.000 Kannada 0.970 0.025 0.845 0.970 0.903 0.891 0.972 Malayalam 0.950 0.000 1.000 0.950 0.975 0.972 0.975 Oriya 1.000 0.000 1.000 1.000 1.000 1.000 1.000 Tamil 0.990 0.000 1.000 0.990 0.995 0.994 0.995 Telugu 0.980 0.000 1.000 0.980 0.990 0.989 0.990 Urdu 0.941 0.004 0.969 0.941 0.955 0.949 0.968 Roman 0.931 0.004 0.969 0.931 0.949 0.943 0.963 Weighted Average 0.970 0.004 0.973 0.970 0.971 0.967 0.983
  • 30. 10 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri Though Table 2 shows encouraging results but still some of the handwritten text lines are misclassified during the experimentation. The main reasons for the same are: (a) presence of speckled noise, (b) skewed words present in some text lines, and (c) occurrence of irregular spaces within text words, punctuation symbols, etc. The structural resemblance in the character set of some of the Indic scripts like Kannada and Telugu as well as Malayalam and Tamil causes similarity in the contiguous pixel distribution which in turns misclassifies them among each other. Fig. 4 shows some samples of misclassified text line images. Figure 4. Samples of text line images written in (a) Kannada, (b) Telugu, (c) Malayalam, and (d) Tamil scripts misclassified as Telugu, Kannada, Tamil and Malayalam scripts respectively Conclusion In this paper, we have proposed a robust method for handwritten script identifi- cation at text line-level for eight official scripts of India. The aim of this paper is to facilitate the research of multilingual handwritten OCR. A set of 54 feature values are extracted using the combination of CCH and DFT. Experimental re- sults have shown that an accuracy rate of 97.03% is achieved using SVM classi- fier with limited dataset of eight different scripts which is quite acceptable tak- ing the complexities and shape variations of the scripts under consideration. In our future endeavor, we plan to modify this technique to perform the script identification from handwritten document images containing more number of Indian languages. Another focus is to increase the size of the database to incor-
  • 31. Handwritten Script Identification from Text Lines | 11 porate larger variations of writing styles which in turn would establish our technique as writer independent. Acknowledgment The authors are thankful to the Center for Microprocessor Application for Train- ing Education and Research (CMATER) and Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and En- gineering Department, Jadavpur University, for providing infrastructure facili- ties during progress of the work. The current work, reported here, has been partially funded by University with Potential for Excellence (UPE), Phase-II, UGC, Government of India. References 1 P.K. Singh, R. Sarkar, M. Nasipuri: “Offline Script Identification from Multilingual Indic- script Documents: A state-of-the-art”, In: Computer Science Review (Elsevier), vol. 15-16, pp. 1-28, 2015. 2 G. D. Joshi, S. Garg, J. Sivaswamy, “Script Identification from Indian Documents”, In: Lec- ture Notes in Computer Science: International Workshop Document Analysis Systems, Nel- son, LNCS-3872, pp. 255-267, Feb. 2006. 3 M. C. Padma, P. A. Vijaya, “Identification of Telugu, Devnagari and English scripts using discriminating features”, In: International Journal of Computer Science and Information Technology (IJCSIT), vol. 1, no.2, Nov.2009. 4 M. C. Padma, P. A. Vijaya, “Script Identification from Trilingual Documents using Profile based Features”, In: International Journal of Computer Science and Applications (IJCSA), vol. 7, no. 4, pp. 16-33, 2010. 5 R. Gopakumar, N. V. SubbaReddy, K. Makkithaya, U. Dinesh Acharya, “Script Identification from Multilingual Indian documents using Structural Features”, In: Journal of Computing, vol. 2, issue 7, pp. 106-111, July 2010. 6 M. Jindal, N. Hemrajani, “Script Identification for printed document images at text-line level using DCT and PCA”, In: IOSR Journal of Computer Engineering, vol. 12, issue 5, pp. 97-102, 2013. 7 R. Rani, R. Dhir, G. S. Lehal, “Gabor Features Based Script Identification of Lines within a Bilingual/Trilingual Document”, In: International Journal of Advanced Science and Tech- nology, vol. 66, pp. 1-12, 2014. 8 I. Kaur, S. Mahajan, “Bilingual Script Identification of Printed Text Image”, In: International Journal of Engineering and Technology, vol. 2, issue 3, pp. 768-773, June 2015. 9 M. Hangarge, B. V. Dhandra, “Offline Handwritten Script Identification in Document Imag- es”, In: International Journal of Computer Applications (IJCA), vol.4, no.6, pp. 1-5, July 2010.
  • 32. 12 | Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar and Mita Nasipuri 10 P. K. Singh, R. Sarkar, M. Nasipuri, “Line-level Script Identification for six handwritten scripts using texture based features”, In: Proc. of 2nd Information Systems Design and Intel- ligent Applications, AISC, vol. 340, pp. 285-293, 2015. 11 R. C. Gonzalez, R. E. Woods, “Digital Image Processing”, vol. I. Prentice-Hall, India (1992). 12 N. Ostu, “A thresholding selection method from gray-level histogram”, In: IEEE Transac- tions on Systems Man Cybernetics, SMC-8, pp. 62-66, 1978. 13 R. M. Haralick, “A Measure of Circularity of Digital Figures”, In: IEEE Transactions on Sys- tems, Man and Cybernetics, vol. SMC-4, pp. 394-396, 1974.
  • 33. Neelotpal Chakraborty1 , Samir Malakar2 , Ram Sarkar3 and Mita Nasipuri4 A Rule based Approach for Noun Phrase Extraction from English Text Document Abstract: This paper is an attempt to focus on an approach that is quite simple to implement and efficient enough to extract Noun Phrases (NPs) from text docu- ment written in English. The selected text documents are articles of reputed Eng- lish newspapers of India, namely, The Times of India, The Telegraph and The Statesman. A specific column (sports) has been taken into consideration. The pro- posed approach concentrates on the following objectives: First, to explore and exploit the grammatical features of the language. Second, to prepare an updated stop list classified into conjunctions, prepositions, articles, common verbs and adjectives. Third, to give special characters due importance. Keywords: Noun Phrase, Rule-based Approach, Natural Language Processing, Data Mining 1 Introduction In the past few decades, world has witnessed a huge text data explosion in the form of printed and/or handwritten form. This data growth would increase expo- nentially as time will pass. Also it is well known paradigm that the searching time for some document is directly proportional to the size of the database where it belongs to. Therefore, such abrupt increase of data is eventually increasing the searching time. But the technology enabled society demands a fast and efficient way to reduce the searching time. The searching time can be optimized only when || 1 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India neelotpal_chakraborty@yahoo.com 2 Department of Master of Computer Applications, MCKV Institute of Engineering, Howrah, In- dia malakarkarsamir@gmail.com 3 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India raamsarkar@gmail.com 4 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India mitanasipuri@gmail.com
  • 34. 14 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri the data are maintained using proper some structure. One of the ways to accom- plish this is the document clustering which is an application of Natural Language Processing (NLP). The job of NLP is to understand and analyze the Natural Lan- guage (the language spoken by humans). The increasing nature of documents motivates a section of the research fraternity throughout the world to direct their research into NLP. The process of document clustering is carried out through se- quential processes comprises of Noun Phrase (NP) extraction, Key Phrase (KP) se- lection and document ranking. 1.1 Noun Phrase, Key Phrase and Document Clustering Any text document irrespective of the content comprises of certain terminology (words or phrases) using which, out of several documents, that particular docu- ment can be identified (or classified) as describing a particular subject or topic. The process of assignment of any text document into a predefined class or subject is known as document clustering. The terminologies used for tagging the text document into a predefined class are usually termed as Keyword or KP which is comprised of single / multiple word(s). Traditionally, a Named Entity (NE), a spe- cial type of NP, is an obvious choice of KP. The research approaches on document clustering, till date, have mainly fo- cused on developing statistical model to identify NPs [1] i.e., identifying quanti- tative features [1]. However, there are certain aspects of any natural language that requires understanding of the subjective/qualitative features of that language. Each particular natural language has its own specific grammatical structure. In general, the vocabulary of the same can be classified into two types, entitled as closed class type and open class type [1]. In first category, new words are added frequently, whereas words are rarely added in the other type. The NPs fall under the open class, and the prepositions, articles, conjunc- tions, certain common verbs, adjectives, adverbs are of closed class types. How- ever, certain verbs, adverbs, adjectives are derived from the NPs, example: ( ) → ( ). Also the appearance of preposition before a particular word determines the type of words. For example, consider the following sentences where the word “take” has been used as noun in the first sentence and as verb in the second sentence. Sentence 1: Just one take is enough for this scene. Sentence 2: I have come to take my books.
  • 35. A Rule based Approach for Noun Phrase Extraction from English Text Document | 15 However, it is worth noting that some verbs, adverbs or adjectives can be derived from the nouns and vice versa. The English language also comprises of upper- case/lowercase letters that add to relevancy of terminology for any particular text document. Therefore in the present work, apart from maintaining a different stop list at different level, conversion of these specific words into their respective noun forms is conducted and then they have been considered for NP extraction. Again, it is well known that human brain possesses the capability of understand the sub- jective or aesthetic features of a natural language. But they might not always be dealt by some statistical or probabilistic models. Therefore, these characteristics of natural language deserve special consideration. The proposed work is devel- oped to extract NPs from text document considering the aesthetic features of the English language. 2 Related Work A number of works [2-19] found in literature aims to extract NPs from text docu- ment. These works can broadly be classified in three categories such as Rule Based, Statistical Model Based and Hybrid Approach. The first category of works [2-7] is mainly employed if adequate data is unavailable. It uses the linguistic model of the language. The second category of works [1, 8-13] does not require linguistic information of language. They are language independent and need suf- ficient data for its successful execution. The third category of works [14-19] de- scribes some hybrid approaches where linguistic features of language along with the statistical information are used for extraction of NPs. The present work be- longs to first category of work. Rule based approaches [2-7] have mainly used two different approaches: top- down [2-3] and bottom-up [4-7] approach. In the first category of works, the sen- tences are divided in continual way and the NPs get extracted whereas in the sec- ond category of works, words, the fundamental unit of sentence, are extracted first and then rules are applied to form the phrasal forms. The work in [2] has considered 7 word length adaptive windows to extract NPs. The approach in [3] is based on sequence labeling and training by kernel methods that captures the non linear relationships of the morphological features of Tamil language. In the work [4], the authors have used morpheme based augmented transition network to construct and detect the NPs form words. The work described in [5] has used CRF-based mechanism with morphological and contextual features. Another method mentioned in [6] has extracted the words from the sentence first and then
  • 36. 16 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri uses Finite State Machine to combine the words to form NPs using Marathi lan- guage Morphology. N-gram based machine translation mechanism has been ap- plied in [7] to extract NPs form English and French language. In statistical / quantitative models, the words are first extracted and then they are combined to form phrase using some probabilistic model using knowledge from large scale of data. The work described in [1] has used Hidden Markov Model (HMM) to extract NPs. Probabilistic finite-state automaton has been used in [8]. A Support Vector Machine (SVM) based method to perform NP extraction has been used in [9]. The work as described in [10] has used a statistical natural language parser trained on a nonmedical domain words as a NP extrac- tor. Feed-forward neural network with embedding, hidden and softmax layers [11], long short-term memory (LSTM) recurrent neural networks [12] and multi- word Expressions using Semantic Clustering [13] have been introduced in to parse the sentence and tag the corresponding NPs therein. The methods belonging of hybrid approach exploits some rule-based ap- proach to create a tagged corpus first and then uses some statistical model to con- firm as NPs or vice-versa. The works as described in [15-16] uses Part of speech tagger to create tagged corpus and then used Artificial Immune Systems (IAS) to confirm final list of NPs for English and Malayalam Language respectively whereas the work mentioned in [17] has employed handmade rule for corpus preparation and then memory based training rule for NP extraction from Japa- nese language. The method [18] first exploits rule-based approach to create a tagged corpus for training and then a multilayer perceptron (MLP) based neural network and Fuzzy C-Means clustering have been used. Ref. [19] first employed HMM to extract the NPs in initial level then has used rule to purify the final result. 3 Corpus Description The corpus is prepared here to conduct experiment on NP extraction from English text document. The text documents comprises of news articles from the sports column of different well known English News papers. 50 such articles are col- lected from popular English newspapers in India namely, The Telegraph, The Times of India and The Statesman. The distribution of the document paper wise is shown in Fig. 1. The database contains total 20378 words. A stop list has been prepared to include words that are highly frequent in all text documents. The stop list includes 49 prepositions, 26 conjunctions, 3 articles, 6 clitics and 682 other stop words that includes common verbs, adjectives, adverbs, etc.
  • 37. A Rule based Approach for Noun Phrase Extraction from English Text Document | 17 Fig 1. Distribution of the collected data from different newspapers 4 English Morphology Morphology [1] for a particular language describes a way by which small mean- ingful units (morphemes) collectively generate words. For example, the mor- phemes ball and s together make up the word balls. Similarly, the word players is made up of three morphemes play, er and s. Morphemes can be broadly classified into two major classes. They are stem and affix. The main/fundamental meaning of the word is carried by its stem and affix provides the additional meanings to the word. Affixes can further be categorized as prefix (precedes a stem e.g., − ), suffix (follows a stem e.g., − ), infix (within stem e.g., − − ) and circumfix (stem is in the middle e.g., en − light − en ). English morphological methods are classified into 4 major types: Inflectional, Derivational, Cliticization and Compounding which along with additional termi- nology (ies) are detailed in the following subsections. The morphological proces- ses are concatenative in nature. 4.1 Inflectional Morphology A stem combines with a grammatical morpheme to generate a word with the same stem class and syntactic function like agreement (see Agreement section). Plural form of noun is usually formed by adding s or es as suffix to its singular form. English comprises a relatively small number of possible inflectional affixes, i.e. only nouns, verbs, and certain adjectives can be inflected. Table 1 contains some
  • 38. 18 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri examples of Noun inflection in the form of regular (having suffix –s or -es) and irregular (having different spelling to form new word) plurals. Table 1. Regular and Irregular plurals Morphological Class Regular Irregular Singular form player ball man child Plural form players balls men children On the other hand, the verbal inflection in English is more complex rather than Noun inflection. The English language contains generally three types verbs like main verbs (e.g., bowl, play), modal verbs (e.g., shall, can), and primary verbs (e.g., has, be). The majority of main verbs are regular since by knowing their stem only, one can form their other forms easily by concatenating suffixes like -s, -ed, -ing. However, the irregular verbs have idiosyncratic inflectional forms. Some ev- ident of such morphological forms for regular / irregular verbs is depicted in Ta- ble 2. Table 2. Morphological forms of Regular / Irregular verbs Morphological Class Regular Verb Inflection Irregular Verb Inflection Stem play kick catch hit go Singular form plays kicks catches hits goes Present participle form playing kicking cat- ching hit- ting going Past form played kicked caught hit went Present / Past participle form played kicked caught hit gone 4.2 Derivational Morphology In Derivational Morphology a stem combines with a grammatical morpheme to generate a word of different class. Its class belongingness is difficult to determine in automatic way. It is quite complex than inflection. In English, it is often found that new nouns can be derived from verbs or adjectives. Such kind of derivations is termed as nominalization [1]. Some examples of such types of derivational nouns are depicted in Table 3.
  • 39. A Rule based Approach for Noun Phrase Extraction from English Text Document | 19 Table 3. Example of Different Derivations Stem Stem Type Suffix Derived into Noun Adjective organization Noun -al - Organizational spine Noun -less - Spineless modernize Verb -ation modernization - appoint Verb -ee appointee - bowl Verb -er bowler - depend Verb -able - Dependable sharp Adjective -ness sharpness 4.3 Cliticization A stem is combined with a clitic, reduced form of a syntactic word like morpheme (e.g., have is reduced to ‘ve’), is termed as Cliticization in English morphology. The new string or word thus formed often acts like a pronoun, article, conjunc- tion, or verbs. Clitics may precede or follow a word. In the former case, it is called a proclitic and in the latter case it is called an enclitic. Some examples are depicted in Table 4. Table 4. Examples of Clitics and their full forms Actual Form am are have not will Clitic form ‘m ‘re ‘ve n't ‘ll In English, usage of clitics is often ambiguous. For example, he’d can be ex- panded to he had or he would. However, the apostrophe simplifies the proper seg- mentation of English clitic.
  • 40. 20 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri 4.4 Compounding Multiple stems are sometime combined to generate a new word. The word oversee is generated by combining the stems over and see. 4.5 Agreement In English language, the noun and main verb are required to agree in numbers. Hence, plural markings are important. These markings are also required to sig- nify the gender. English has the masculine and feminine genders to represent male and female respectively. Other genders include any object or thing that can- not generalize into male or female. When the class number is very large, they are usually referred to as noun classes instead of gender. 5 Proposed System The work as described here is a rule based mechanism to extract NPs from text document. At first it accepts the whole text document and then extracts sen- tence(s) from it. The extracted sentences based on full stop as delimiter and are passed through two modules. The first module is Phrase extractor which splits each sentence into a number of simple sentence like phrases and then it contin- ues to extract fundamental phrases. The splitting delimiters are punctuation and bracket symbols, conjunctions, prepositions and other stop words. The final list of phrases becomes the input to the second module. The second module is de- signed to finalize the NPs from the set of phrases. Therefore, the present work uses top-down approach to extract NPs. The modules are detailed in the following subsections. 5.1 Phrase Extractor In this phase, a text document is broken down into a list of phrases. It first split the sentences into simple sentence like phrases. Then it continues to extract fun- damental phrases form it. It is also noteworthy, ambiguity is found for some stop words. For example, Jammu and Kashmir, a name of an Indian state, where and cannot be considered as conjunction since it is used to join two nouns. Also, the issue of uppercase and lowercase letters is quite prevalent in English language. The first letter in a sentence is in uppercase form in most of the cases. Obviously, a sentence may begin with a stop word or number or symbol. It may not be a
  • 41. A Rule based Approach for Noun Phrase Extraction from English Text Document | 21 noun. Other such issues get addressed here during simple sentence like phrase extraction and also for NP selection (described in NP Finalization section). The detail mechanism of phrase extraction is described in Algorithm 1. Algorithm 1: Input: A text document Output: List of NEs Step 1: Extract sentences from text document using full stop as delimiter Step 2: If sentence’s first character = Upper case letter { If (first word is a stop word) { Change the first character to lower case } Else { No change } } Step 3: Split each sentence into sub sentences using punctuation and bracket symbols as delimiters. Step 4: Split each sub sentence into parts using conjunction as delim- iter If conjunction is ‘and’: If string before ‘and’ has verb/verb phrase { If string after ‘and’ has verb/verb phrase { Split using ‘and’ as delimiter } Else { No change } } Else { No change
  • 42. 22 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri } Step 5: Split each part into sub-parts using preposition as delimiter Step 6: Split sub-parts using clitics to get sub sub-parts Step 7: Split sub sub-parts into phrases using other stop words such as common verbs, common adjectives and adverbs, pronouns as delimiters. 5.2 NP Finalization The list obtained the first module (Phrase Extractor) may contain phrases that have some stop words attached to it either at the beginning or at the end. Further- more, the phrase itself may be a non NP. There a purifying mechanism has been designed to confirm the final list of NPs from phrase list. The mechanism is de- scribed in Algorithm 2. Algorithm 2: Input: List of phrases Output: List of NPs For (all phrases in the list) { Step 1: If phrase contains only stop word(s) or unwanted symbol(s), delete phrase. Step 2: If phrase starts with an article or upper case letter, no change. Step 3: If phrase contains stop word or unwanted symbol at the begin- ning or end, prune it from the phrase. Step 4: If word in a phrase has suffices and no capital letter at word’s first position split the phrase. } 6 Result and Discussion For experimental purpose 50 text document from 3 popular News paper has been collected. The detail of data has already been described in Corpus Description section. The intermediate and final result of the designed process is described using a text line as example which is described using an example. Note that we are not considering the articles or pronouns.
  • 43. A Rule based Approach for Noun Phrase Extraction from English Text Document | 23 The quantitative result of the described mechanism has been prepared in manual way. All the valid NPs from the documents have been selected first. The designed process is employed on the same text document. Finally, human generated list of NPs is compared with the NP list generated by the proposed system. The final result is quantizing using the statistical measures like recall, precision and F- measure. The average recall, precision and F-measure for these 50 text docu- ments are 97%, 74% and 84% respectively. Sample result of the same is given in Table 5. Example Sentence: The former Australian batsman has been a part of the South African support staff during the World Cup in Australia and New Zealand, and T20 captain Faf Du Plessis feels that his presence will only help youngsters. Fig 2. Successive breaking/splitting of sample text/sentence to get NPs Desired result (NP List) Result from the proposed system 1. former Australian batsman 2. part 3. South African support staff 4. World Cup 5. Australia and New Zealand 6. T20 captain Faf Du Plessis 7. presence 8. youngsters. 1. former Australian batsman 2. South African support staff 3. World Cup 4. Australia and New Zealand 5. T20 captain Faf Du Plessis 6. presence 7. youngsters
  • 44. 24 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri Table 5. Depiction of detail result for 5 sample text documents Document # # of Words TP FP FN Precision Recall F-measure 1 692 165 57 0 0.743 1 0.853 2 572 111 54 2 0.673 0.982 0.799 3 308 94 20 1 0.825 0.989 0.897 4 175 35 14 1 0.714 0.972 0.823 8 280 84 16 0 0.840 1 0.913 Conclusion Any natural language follows certain rules or grammar. Although the rules may vary from language to language, still most languages currently being communi- cated (by significant number of humans) have more or less the same grammatical syntax and structure. The present work proposes a mechanism to extract NPs from English text documents using these rules. The English morphological rules are considered here. The algorithm is extremely simple and although it may seem rather primitive in nature, the method provides some vital benefits since it in- cludes some subjective or aesthetic features of a natural language. Therefore the proposed system has its tendency towards universality. Also, the mechanism ex- tracts the NPs in admissible time. The average recall, precision and F-measure for these 50 English text documents are 97%, 74% and 84% respectively. In English, the word count is 1.2 billion and still counting since English has many words derived from various languages namely, Latin, Sanskrit, French, etc. as a result, the number of nouns may also increase. This approach uses storage of a significant number of stop words but this stop list is not ultimate. So, number of stop words may constrain overall performance. Incorporating more composite rules to phrase extraction can enhance the model. References 1 Daniel Jurafsky and James H. Martin, “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition”, Pear- son, 2nd Edition. 2 Bennett, Nuala A., et al. "Extracting noun phrases for all of MEDLINE." Proceedings of the AMIA Symposium. American Medical Informatics Association, 1999.
  • 45. A Rule based Approach for Noun Phrase Extraction from English Text Document | 25 3 Dhivya, R., Dhanalakshmi, V., Kumar, M. A., & Soman, K. P. (2012). Clause boundary identi- fication for tamil language using dependency parsing. In Signal Processing and Information Technology (pp. 195-197). Springer Berlin Heidelberg. 4 Nair, L. R., & Peter, S. D. (2011, October). Shallow parser for Malayalam language using finite state cascades. In Biomedical Engineering and Informatics (BMEI), 2011 4th International Conference on (Vol. 3, pp. 1264-1267). IEEE. 5 El-Kahlout, I. D., & Akın, A. A. (2013). Turkish constituent chunking with morphological and contextual features. In Computational Linguistics and Intelligent Text Processing (pp. 270- 281). Springer Berlin Heidelberg. 6 Bapat, M., Gune, H., & Bhattacharyya, P. (2010, August). A paradigm-based finite state mor- phological analyzer for Marathi. In Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) (pp. 26-34). 7 Marino, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A., & Costa- Jussà, M. R. (2006). N-gram-based machine translation. Computational Linguistics, 32(4), 527-549. 8 Serrano, J. I., & Araujo, L. (2005, September). Evolutionary algorithm for noun phrase detec- tion in natural language processing. In Evolutionary Computation, 2005. The 2005 IEEE Con- gress on (Vol. 1, pp. 640-647). IEEE. 9 Dhanalakshmi, V., & Rajendran, S. (2010). Natural Language processing Tools for Tamil grammar Learning and Teaching. International journal of Computer Applications (0975- 8887), 8(14). 10 Huang, Y., Lowe, H. J., Klein, D., & Cucina, R. J. (2005). Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. Journal of the American Medical Infor- matics Association, 12(3), 275-285. 11 Coppola, C. A. D. W. G., & Petrov, S. Improved Transition-Based Parsing and Tagging with Neural Networks. 12 Ballesteros, M., Dyer, C., & Smith, N. A. (2015). Improved transition-based parsing by mod- eling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657. 13 Chakraborty, Tanmoy, Dipankar Das, and Sivaji Bandyopadhyay. "Identifying Bengali Multi- word Expressions using Semantic Clustering" Lingvisticæ Investigationes 37.1 (2014): 106- 128. 14 Pattabhi R K Rao T, Vijay Sundar Ram R, Vijayakrishna R and Sobha L, “A Text Chunker and Hybrid POS Tagger for Indian Languages”, Proceedings of IJCAI-2007, SPSAL-2007. 15 Kumar, A., & Nair, S. B. (2007). An artificial immune system based approach for English grammar checking. In Artificial Immune Systems (pp. 348-357). Springer Berlin Heidelberg. 16 Bindu, M. S., & Idicula, S. M. (2011). A Hybrid Model For Phrase Chunking Employing Artificial Immunity System And Rule Based Methods. International Journal of Artificial Intelligence & Applications, 2(4), 95. 17 Park, S. B., & Zhang, B. T. (2003, July). Text chunking by combining hand-crafted rules and memory-based learning. In Proceedings of the 41st Annual Meeting on Association for Com- putational Linguistics-Volume 1 (pp. 497-504). Association for Computational Linguistics. 18 Kian, S., Akhavan, T., & Shamsfard, M. (2009, October). Developing a persian chunker using a hybrid approach. In Computer Science and Information Technology, 2009. IMCSIT'09. In- ternational Multiconference on (pp. 227-234). IEEE.
  • 46. 26 | Neelotpal Chakraborty, Samir Malakar, Ram Sarkar and Mita Nasipuri 19 Ibrahim, A., & Assabie, Y. (2013). Hierarchical Amharic Base Phrase Chunking Using HMM With Error Pruning. In Proceedings of the 6th Conference on Language and Technology, Poz- nan, Poland (pp. 328-332).
  • 47. Jaya Gera1 and Harmeet Kaur2 Recommending Investors using Association Rule Mining for Crowd Funding Projects Abstract: Many projects fail to meet their funding goal due to lack of sufficient funders. Crowd Funders are the key component of crowdfunding phenomenon. Their monetary support makes a project’s success possible. Their decision is based on project’s quality and their own interests. In this paper, we aim to pro- mote projects by recommending promising projects to potential funders so as to help projects meet their goal. We have developed a recommendation model that learns funders’ interests and recommends promising projects that match their profiles. A profile is generated using funders backing history. This experiment is conducted using Kickstarter dataset. Projects are analysed on several aspects: various project features, funding pattern during its funding cycle, success prob- ability etc. Initially, recommendations are generated by mining funders’ history using association rule. As few backers have backed multiple projects, data is sparse. Though, association rule mining is quiet efficient and generates im- portant rules but is not able to promote promising projects. So, recommendations are refined by identifying promising projects on the basis of percentage funding received, pledge behaviour and success probability. Keywords: crowdfunding; recommender systems; association rule mining; user interest; success probability; pledge behaviour. 1 Introduction One of the most challenging tasks for setting up a new venture is to arrange suf- ficient funds. Although crowdfunding has emerged as viable alternative solution for raising funds for new ventures; not all of them are successful to raise sufficient funds. One of the most common reasons for failure is that venture initiators are novice and have difficulty in understanding and leveraging their social network || 1 Department of Computer Science, Shyama Prasad Mukherji College, University of Delhi, Delhi, India jayagera@spm.du.ac.in 2 Department of Computer Science, Hans Raj College, University of Delhi, Delhi, India hkaur@hrc.du.ac.in
  • 48. 28 | Jaya Gera and Harmeet Kaur to reach to correct audience for their product [1][2]. Audience is diverse in nature and spread across the globe. Audience is not just consumer of product but turn- ing to the role of wise investors/funders [3]. Crowd funders not only provide mon- etary support but also motivate and influence other funders’ decision and help in promoting projects that is essential for projects' success. Capturing wisdom of funders and finding funders matching with the project profile cannot be done by the project initiator or creator alone. This leads to requirement of emergence of crowdfunding intermediators or crowdfunding platforms. Crowdfunding Platforms act as intermediators between project initiators and potential funders [4]. It provides certain functionality and acts as an electronic matching market that overcomes information asymmetry and costs [5]. Their ob- jective is to maximize the number of successful projects [6]. To achieve this, they need to design policies and strategies to motivate funders to fund [7] so that the site can get more projects funded. This can be achieved by analysing funders’ trend and timing of contribution and via coordination among them [7]. With the increase in popularity of crowdfunding, crowdfunding platforms have also grown like mushrooms. Most platforms do not do more than providing a platform to present the projects and a mechanism for online payment to collect pledge [8]. But, some of them do provide value added services such as provide suggestions [8], help in expanding network [6], building trust and much more. The efforts put by crowdfunding platforms have positive influence and help cre- ators in raising funds. Figure 1. source: https://www.kickstarter.com/ Some also assess and promote projects on their sites, for example, Kickstarter platform promotes projects in various ways: staffs pick projects, project of the
  • 49. Recommending Investors using Association Rule Mining for Crowd Funding projects | 29 day, popular projects, allows to discover projects popular among friend circle etc. Figure 1 shows one such snapshot of Kickstarter (retrieved on 5 Jan 2016). In Literature, attention has not been paid towards matching projects and fun- ders to promote them among suitable funders. Proposed work has developed a method that generates rules using backers’ funding pattern, learns funders’ pro- file and trend of funding and matches projects with profile and recommends them to funders. The aim is to assist project initiator in raising funds and to improve performance and add functionality to the platforms. Rest of paper discusses liter- ature work, then dataset and its characteristics, followed by proposed work and conclusion. 2 Related Work Though, crowdfunding is now a mature domain, understanding about dynamics of crowdfunding platform is lacking [4]. Most of the literature work focuses on analysing various factors and their impact on success of crowdfunding projects, role of social media and geography, impact of social network size on success, mo- tivation behind investment decisions, effect of timing and coordination of inves- tors etc. However, less emphasis is given on understanding role of various plat- forms, ways of making policies, understanding and adding to existing functionalities. Some researchers have paid attention to these dimensions and brought new insights about crowdfunding intermediators. Ref. [5] developed an empirical taxonomy of crowd funding platform that characterizes various crowdfunding intermediation models on the basis of He- donism, Altruism, and Profit. They also focused on how crowdfunding interme- diaries manage financial intermediation and how do they transform relations be- tween initiator and funder in two-sided online markets. Ref. [9] proposed different ways of recommending investors based on twitter data. Recommendation is generated on the basis of pledge behaviour of frequent and occasional investors. Research suggested that frequent investors are at- tracted by ambitious projects whereas occasional investors act as donors and mainly invest in art related projects. Ref. [10] categorized Kickstarter projects’ features as social, temporal and ge- ographical features and analysed these features’ impact on project success. This analysis also build recommendation model using gradient boosting tree that uses these features to recommend set of backers to Kickstarter projects.
  • 50. 30 | Jaya Gera and Harmeet Kaur Ref. [7] analysed donors’ contribution, their timing of contribution and coor- dination among donors and impact on funding of projects. Ref. [11] suggests do- nors funding decision play an important role in the ultimate success of a crowd- funding project. Potential donors see the level of support from other project backers as well as their timing before making their own funding decision. Ref. [12] observed temporal distribution of customer interest and concluded that there ex- ist strong correlation between a crowd funding projects early promotional activi- ties and its final outcome. They also discussed importance of concurrent promo- tion of projects from multiple sources. Ref. [1] revealed that interacting and connecting with certain key individuals provide advantage of acquiring resources and spreading information about pro- jects. This study also disclosed that a small portion of fund comes from strong ties (family, friends, colleagues, relatives etc.) but large portion of funds come from weak ties i.e. from people on network whom creator rarely met or interacted with. Ref. [5] also suggested that matching projects with its potential investors enables successful funding. Various research studies suggest that crowdfunding market is growing fast in all respects i.e. volume of projects launched every day, number of funders turn- ing up and number of platforms rising up. But an increase in volume does not mean increase in performance. So, there is need to develop a mechanism to match projects and funders. The contributions of this work are: i. Add to platform functionality by automatically matching projects with potential funders ii. Understanding funders and their interests by maintaining their profiles iii. Assist initiators in evaluating success prospects of their projects 3 Data Set This experiment is conducted on Kickstarter data. This dataset consists of data about projects, funding history of projects and project backers. Projects and their funding history are obtained from kickspy1 website and backers’ data for each of the project in this dataset is obtained by crawling backers’ pages from kickstarter2 website. This dataset consists of data of 4862 projects launched in the month of April 2014 and backing history of 97,608 backers who backed these projects. Pro- ject data includes project id, name of project, goal amount, pledged amount, sta- tus, category, subcategory, duration, rewards, facebook friends, facebook shares, start date, end date, duration etc. Pledge data consists of amount pledged on each day during funding cycles by these projects. This dataset also contains
  • 51. Recommending Investors using Association Rule Mining for Crowd Funding projects | 31 data of live, suspended and cancelled projects. These projects and their backing transactions are removed for analysis purpose. After removing these projects, da- taset is left with 4,121 projects and 92,770 backers. Out of 4,121 projects, 1,899 (46%) are successful and 2,232 (54%) are unsuccessful. To have better understanding of individual project characteristics, Mean Value Analysis is done. Table 1 lists mean value for some of project characteris- tics. Table 1. Mean Value Analysis All Successful Unsuccessful Projects 4,121 1,899 (46%) 2,232 (54%) Goal Amount 54537.69 10882.21 91484.46 Pledged Amount 11393.78 22166.31 2276.70 Backers 139.70 272.80 27.06 Rewards 9.82 11.30 8.57 Updates 3.18 5.25 1.42 Comments 19.62 38.95 3.26 Duration 31.81 30.14 33.21 Facebook Friends 711.25 823.66 613.07 In nutshell, successful campaigns on an average have low goal to achieve, less duration, a significantly large number of funders and facebook friends to support, offers a good number of rewards and have a better interaction between creators and funders through updates and comments. 4 Proposed work There are a large number of projects that are unable to complete because they fail to publicize and attract sufficient number of funders. For a project to be success- ful, it must reach its funding goal. To reach its goal, there should be sufficient number of investors, who are willing to invest and take risk. Research reveals that 20-40% of initial funding comes from family and friend [13]. But, a large number of funders are unknown to the creator and fund for various reasons. With the 1 http://www.kickspy.com. This web site is currently shut down. 2 https://www.kickstarter.com/ * Backers’ page has now been removed by Kickstarter website.
  • 52. 32 | Jaya Gera and Harmeet Kaur growth of technology and security aspects, large number of creators as well fun- ders are participating online. These funds are small in amount and spread across various projects [14]. As the funding amounts are not very large and come from large network of unknown people, there is a need to coordinate investors funding [14] to have more number of successful projects. To assist potential funders, we have developed a recommender model that learns through funders backing his- tory using association rule mining and recommend and promote projects among potential funders. 4.1 Method Our aim is to assist initiators, funders as well as platforms such that overall suc- cess rate of platform is increased and all the stakeholders are benefitted. Some important issue are: which projects need to be promoted? What criteria should be used to identify such projects? Projects that signal high quality and popular in social network get funded soon. Projects that signal low quality raise nothing or very less. Such projects may not get funded even by friends and family. Projects that possess good quality and perform well initially but lose their track later on are the best candidates for promotion. This model identifies such projects by an- alysing their quality and funding pattern. Fig. 2 shows model components. Recommendation model has five modules: i) Predictor ii) Trend Monitor iii) Profile Modeller iv) Rule Generator v) Recommender Predictor: Some projects perform well on monetary front and attract large amount than required. Some projects perform poorly and attract nothing or little mone- tary investment. Project success is also influenced by project quality [15]. Project quality is assessed by project preparation and presentation. Assessing true status of project preparation is not feasible, because creators disclose as much as they wish to. Crowdfunding suffers from information asymmetry [6] i.e. creator knows actual situation whereas funder can assess using information disclosed. So, in this module, project success is evaluated based on project presentation. Project is characterized by various features such as Category, has video, number of vid-
  • 53. Recommending Investors using Association Rule Mining for Crowd Funding projects | 33 eos, number of images, goal amount, duration, facebook friends etc. These fea- tures are good indicator of project quality. Project success is predicted by feeding these attributes to logistic regression. This module predicts project success with 81.5% accuracy. Figure 2. Recommender model Trend Monitor: Predictor’s prediction is based on static features available at the time of launch of projects such as goal amount, category etc. This does not assess performance of project after launch. Our aim is to promote projects that are of good quality but could not raise enough and lack by a little margin. We need to identify such projects whose project presentations are as good as successful ones but grow slow during their funding cycle. This can be done by monitoring their funding behaviour. We need to understand nature of successful and unsuccessful funding pattern. Successful projects generally grow faster than unsuccessful one. Pledge analysis [16] states, if a campaign has raised approximately 20% of its goal within the first 15% of funding cycle, its success probability is high. Unsuccessful initially starts well but fails to retain this growth after sometime. So, campaigns that could raise 20% of funds within 20% of funding time are good candidates to be promoted. Module Trend monitor performs analysis of funding behaviour of project and identifies such projects. Profile Modeller: This module learns backer profile by analysing backer’s fund- ing history. Profile of a backer Bi is defined as
  • 54. 34 | Jaya Gera and Harmeet Kaur Bi = {Backer_idi, Namei, Locationi, CategoryPrefi} Each backer is assigned a unique identification i.e. backer_id. Name attribute contains Name of backer and Location attribute contains address and city of backer. CategoryPref is generated by scanning backing history and finding cate- gory and subcategory of each project backed. CategoryPrefi is a set that is a de- fined as: CategoryPrefi = {{Catj1, subcatk1, nk1}, {Catj2, subcatk2, nk2}, ... {Catjm, subcatkm, nkm}}i.e. Backer has supported nka number of projects of subcategory ka of cate- gory ja. Rule Generator: Recommender system not only identifies projects to be pro- moted but also understands trend of backers and learns which projects, backers are frequently backing. To understand behaviour pattern of backers, we used As- sociation rule mining technique of data mining. Association rule mining aims to extract interesting correlations, frequent patterns, associations or casual struc- tures among sets of items in the transaction databases or other data repositories [17]. Association rule mining is generally used in Market Basket Analysis. It mines transactions history and tells which items are frequently bought. As we are inter- ested in knowing which projects are backed together by different backers, we have used association rule mining technique. Rules are generated by applying Apriori algorithm of Association rule mining technique. Two parameters support and confidence are used to measure interest- ingness of rules. Rules that satisfy minimum support and minimum confidence value are refereed as strong association rule and are of interest [17][18]. For asso- ciation rules of the form X ⇒ Y where X and Y are sets of items, support and con- fidence formulas are defined as: ⇒ = ℎ ∧ ⇒ = ℎ ∧ Association rule mining has two phases: i) finding frequent item sets ii) gen- erating rules. First phase finds itemsets that satisfy minimum support count value. Second phase generates rule using itemsets that satisfy confidence thresh- old value. This dataset consists of list of projects backed by backers. Let us understand Apriori Algorithm with the help of an example.
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 56. IN THE TEXT PAGE Flint Arrow-heads 37 Flint Scrapers 45 A Cooking-pot 46 Flint Scrapers 49 Fragment of Cooking-pot 50 Cross, Whitchurch Down 65 Plan of Hut, Shapley Common 67 Hut Circle, Grimspound 69 Logan Rock. The Rugglestone, Widdecombe 77 Roos Tor Logans 79 Covered Chamber, Whit Tor 100 Construction of Stone and Timber Wall 101 Tin-workings, Nillacombe 109 Mortar-stone, Okeford 111 Slag-pounding Hollows, Gobbetts 113 Smelting in 1556 114 Plan of Blowing-house, Deep Swincombe 115 Tin-mould, Deep Swincombe 117 Smelting Tin in Japan 119 A Primitive Hinge 133 Inscription on Sourton Cross 142 Inscribed Stone, Sticklepath 150 Plan of Stone Rows near Caistor Rock 161 " " Grimspound 166 " " Hut at Grimspound 169 Fragment of Pottery 177 Ornamented Pottery 179 Tom Pearce's Ghostly Mare 191 Crazing-mill Stone, Upper Gobbetts 204 Method of using the Mill-stones 205 Chancel Capital, Meavy 237
  • 57. Blowing-house below Black Tor 271 DARTMOOR
  • 59. D CHAPTER I. BOGS The rivers that flow from Dartmoor—The bogs are their cradles—A tailor lost on the moor—A man in Aune Mire —Some of the worst bogs—Cranmere Pool—How the bogs are formed—Adventure in Redmoor Bog—Bog plants—The buckbean—Sweet gale—Furze—Yellow broom—Bee-keeping. artmoor proper consists of that upland region of granite, rising to nearly 2,000 feet above the sea, and actually shooting above that height at a few points, which is the nursery of many of the rivers of Devon. The Exe, indeed, has its source in Exmoor, and it disdains to receive any affluents from Dartmoor; and the Torridge takes its rise hard by the sea at Wellcombe, within a rifle-shot of the Bristol Channel, nevertheless it makes a graceful sweep—tenders a salute—to Dartmoor, and in return receives the liberal flow of the Okement. The Otter and the Axe, being in the far east of the county, rise in the range of hills that form the natural frontier between Devon and Somerset. But all the other considerable streams look back upon Dartmoor as their mother. And what a mother! She sends them forth limpid and pure, full of laughter and leap, of flash and brawl. She does not discharge them laden with brown mud, as the Exe, nor turned like the waters of Egypt to blood, as the Creedy. A prudent mother, she feeds them regularly, and with considerable deliberation. Her vast bogs act as sponges, absorbing the winter
  • 60. rains, and only leisurely and prudently does she administer the hoarded supply, so that the rivers never run dry in the hottest and most rainless summers. Of bogs there are two sorts, the great parental peat deposits that cover the highland, where not too steep for them to lie, and the swamps in the bottoms formed by the oozings from the hills that have been arrested from instant discharge into the rivers by the growth of moss and water-weeds, or are checked by belts of gravel and boulder. To see the former, a visit should be made to Cranmere Pool, or to Cut Hill, or Fox Tor Mire. To get into the latter a stroll of ten minutes up a river-bank will suffice. The existence of the great parent bogs is due either to the fact that beneath them lies the impervious granite, as a floor, somewhat concave, or to the whole rolling upland being covered, as with a quilt, with equally impervious china-clay, the fine deposit of feldspar washed from the granite in the course of ages. In the depths of the moor the peat may be seen riven like floes of ice, and the rifts are sometimes twelve to fourteen feet deep, cut through black vegetable matter, the product of decay of plants through countless generations. If the bottom be sufficiently denuded it is seen to be white and smooth as a girl's shoulder—the kaolin that underlies all. On the hillsides, and in the bottoms, quaking-bogs may be lighted upon or tumbled into. To light upon them is easy enough, to get out of one if tumbled into is a difficult matter. They are happily small, and can be at once recognised by the vivid green pillow of moss that overlies them. This pillow is sufficiently close in texture and buoyant to support a man's weight, but it has a mischievous habit of thinning around the edge, and if the water be stepped into where this fringe is, it is quite possible for the inexperienced to go under, and be enabled at his leisure to investigate the lower surface of the covering duvet of porous moss. Whether he will be able to give to the world the benefit of his observations may be open to question.
  • 61. The thing to be done by anyone who gets into such a bog is to spread his arms out—this will prevent his sinking—and if he cannot struggle out, to wait, cooling his toes in bog water, till assistance comes. It is a difficult matter to extricate horses when they flounder in, as is not infrequently the case in hunting; every plunge sends the poor beasts in deeper. One afternoon, in the year 1851, I was in the Walkham valley above Merrivale Bridge digging into what at the time I fondly believed was a tumulus, but which I subsequently discovered to be a mound thrown up for the accommodation of rabbits, when a warren was contemplated on the slope of Mis Tor. Towards evening I was startled to see a most extraordinary object approach me—a man in a draggled, dingy, and disconsolate condition, hardly able to crawl along. When he came up to me he burst into tears, and it was some time before I could get his story from him. He was a tailor of Plymouth, who had left his home to attend the funeral of a cousin at Sampford Spiney or Walkhampton, I forget which. At that time there was no railway between Tavistock and Launceston; communication was by coach. When the tailor, on the coach, reached Roborough Down, "'Ere you are!" said the driver. "You go along there, and you can't miss it!" indicating a direction with his whip. So the tailor, in his glossy black suit, and with his box-hat set jauntily on his head, descended from the coach, leaped into the road, his umbrella, also black, under his arm, and with a composed countenance started along the road that had been pointed out. Where and how he missed his way he could not explain, nor can I guess, but instead of finding himself at the house of mourning, and partaking there of cake and gin, and dropping a sympathetic tear, he got up on to Dartmoor, and got—with considerable dexterity—away from all roads. He wandered on and on, becoming hungry, feeling the gloss go out of his new black suit, and raws develop upon his top-hat as it got
  • 62. knocked against rocks in some of his falls. Night set in, and, as Homer says, "all the paths were darkened"—but where the tailor found himself there were no paths to become obscured. He lay in a bog for some time, unable to extricate himself. He lost his umbrella, and finally lost his hat. His imagination conjured up frightful objects; if he did not lose his courage, it was because, as a tailor, he had none to lose. He told me incredible tales of the large, glaring-eyed monsters that had stared at him as he lay in the bog. They were probably sheep, but as nine tailors fled when a snail put out its horns, no wonder that this solitary member of the profession was scared at a sheep. The poor wretch had eaten nothing since the morning of the preceding day. Happily I had half a Cornish pasty with me, and I gave it him. He fell on it ravenously. Then I showed him the way to the little inn at Merrivale Bridge, and advised him to hire a trap there and get back to Plymouth as quickly as might be. "I solemnly swear to you, sir," said he, "nothing will ever induce me to set foot on Dartmoor again. If I chance to see it from the Hoe, sir, I'll avert my eyes. How can people think to come here for pleasure— for pleasure, sir! But there, Chinamen eat birds'-nests. There are depraved appetites among human beings, and only unwholesome- minded individuals can love Dartmoor." There is a story told of one of the nastiest of mires on Dartmoor, that of Aune Head. A mire, by the way, is a peculiarly watery bog, that lies at the head of a river. It is its cradle, and a bog is distributed indiscriminately anywhere. A mire cannot always be traversed in safety; much depends on the season. After a dry summer it is possible to tread where it would be death in winter or after a dropping summer. A man is said to have been making his way through Aune Mire when he came on a top-hat reposing, brim downwards, on the sedge. He
  • 63. gave it a kick, whereupon a voice called out from beneath, "What be you a-doin' to my 'at?" The man replied, "Be there now a chap under'n?" "Ees, I reckon," was the reply, "and a hoss under me likewise." There is a track through Aune Head Mire that can be taken with safety by one who knows it. Fox Tor Mire once bore a very bad name. The only convict who really got away from Princetown and was not recaptured was last seen taking a bee-line for Fox Tor Mire. The grappling irons at the disposal of the prison authorities were insufficient for the search of the whole marshy tract. Since the mines were started at Whiteworks much has been done to drain Fox Tor Mire, and to render it safe for grazing cattle on and about it. There is a nasty little mire at the head of Redaven Lake, between West Mill Tor and Yes Tor, and there is a choice collection of them, inviting the unwary to their chill embraces, on Cater's Beam, about the sources of the Plym and Blacklane Brook, the ugliest of all occupying a pan and having no visible outlet. The Redlake mires are also disposed to be nasty in a wet season, and should be avoided at all times. Anyone having a fancy to study the mires and explore them for bog plants will find an elegant selection around Wild Tor, to be reached by ascending Taw Marsh and mounting Steeperton Tor, behind which he will find what he desires. "On the high tableland," says Mr. William Collier, "above the slopes, even higher than many tors, are the great bogs, the sources of the rivers. The great northern bog is a vast tract of very high land, nothing but bog and sedge, with ravines down which the feeders of the rivers pour. Here may be found Cranmere Pool, which is now no pool at all, but just a small piece of bare black bog. Writers of Dartmoor guide-books have been pleased to make much of this Cranmere Pool, greatly to the advantage of the living guides, who take tourists there to stare at a small bit of black bog, and leave their cards in a receptacle
  • 64. provided for them. The large bog itself is of interest as the source of many rivers; but there is absolutely no interest in Cranmere Pool, which is nothing but a delusion and a snare for tourists. It was a small pool years ago, where the rain water lodged; but at Okement Head hard by a fox was run to ground, a terrier was put in, and by digging out the terrier Cranmere Pool was tapped, and has never been a pool since. So much for Cranmere Pool! "This great northern bog, divided into two sections by Fur Tor and Fur Tor Cut, extends southwards to within a short distance of Great Mis Tor, and is a vast receptacle of rain, which it safely holds throughout the driest summer. Fur Tor Cut is a passage between the north and south parts of this great bog, evidently cut artificially for a pass for cattle and men on horseback from Tay Head, or Tavy Head, to East Dart Head, forming a pass from west to east over the very wildest part of Dartmoor. Anyone can walk over the bogs; there is no danger or difficulty to a man on foot unless he gets exhausted, as some have done. But horses, bullocks, and sheep cannot cross them. A man on horseback must take care where he goes, and this Fur Tor Cut is for his accommodation."[1] The Fur Tor Mire is not composed of black but of a horrible yellow slime. There is no peat in it, and to cross it one must leap from one tuft of coarse grass to another. The "mires" are formed in basins of the granite, which were originally lakes or tarns, and into which no streams fall bringing down detritus. They are slowly and surely filling with vegetable matter, water-weeds that rot and sink, and as this vegetable matter accumulates it contracts the area of the water surface. In the rear of the long sedge grass or bogbean creeps the heather, and a completely choked-up mire eventuates in a peat bog. Granite has a tendency to form saucer-like depressions. In the Bairischer Wald, the range dividing Bavaria from Bohemia, are a number of picturesque tarns, that look as though they occupied the
  • 65. craters of extinct volcanoes. This, however, is not the case; the rock is granite, but in this case the lakes are so deep that they have not as yet been filled with vegetable deposit. On the Cornish moors is Dosmare Pool. This is a genuine instance of the lake in a granitic district. In Redmoor, near Fox Tor, on the same moors, we have a similar saucer, with a granitic lip, over which it discharges its superfluous water, but it is already so much choked with vegetable growth as to have become a mire. Ten thousand years hence it will be a great peat bog. I had an adventure in Redmoor, and came nearer looking into the world beyond than has happened to me before or since. Although it occurred on the Cornish moors, it might have chanced on Dartmoor, in one of its mires, for the character of both is the same, and I was engaged in the same autumn on both sets of moors. Having been dissatisfied with the Ordnance maps of the Devon and Cornish moors, and desiring that certain omissions should be corrected, I appealed to Sir Charles Wilson, of the Survey, and he very readily sent me one of his staff, Mr. Thomas, to go over the ground with me, and fill in the particulars that deserved to be added. This was in 1891. The summer had been one of excessive rain, and the bogs were swollen to bursting. Mr. Thomas and I had been engaged, on November 5th, about Trewartha Marsh, and as the day closed in we started for the inhabited land and our lodgings at "Five Janes." But in the rapidly closing day we went out of our course, and when nearly dark found ourselves completely astray, and worst of all in a bog. We were forced to separate, and make our way as best we could, leaping from one patch of rushes or moss to another. All at once I went in over my waist, and felt myself being sucked down as though an octopus had hold of me. I cried out, but Thomas could neither see me nor assist me had he been able to approach. Providentially I had a long bamboo, like an alpenstock, in my hand, and I laid this horizontally on the surface and struggled to raise myself by it. After some time, and with desperate effort, I got myself over the bamboo, and was finally able to crawl away like a lizard on my face. My watch was stopped in my waistcoat pocket, one of my
  • 66. gaiters torn off by the suction of the bog, and I found that for a moment I had been submerged even over one shoulder, as it was wet, and the moss clung to it. On another occasion I went with two of my children, on a day when clouds were sweeping across the moor, over Langstone Moor. I was going to the collection of hut circles opposite Greenaball, on the shoulder of Mis Tor. Unhappily, we got into the bog at the head of Peter Tavy Brook. This is by no means a dangerous morass, but after a rainy season it is a nasty one to cross. Simultaneously down on us came the fog, dense as cotton wool. For quite half an hour we were entangled in this absurdly insignificant bog. In getting about in a mire, the only thing to be done is to leap from one spot to another where there seems to be sufficient growth of water-plants and moss to stay one up. In doing this one loses all idea of direction, and we were, I have no doubt, forming figures of eight in our endeavours to extricate ourselves. I knew that the morass was inconsiderable in extent, and that by taking a straight line it would be easy to get out of it, but in a fog it was not possible to take a bee-line. Happily, for a moment the curtain of mist lifted, and I saw on the horizon, standing up boldly, the stones of the great circle that is planted on the crest. I at once shouted to the children to follow me, and in two minutes we were on solid land. The Dartmoor bogs may be explored for rare plants and mosses. The buckbean will be found and recognised by its three succulent sea-green leaflets, and by its delicately beautiful white flower tinged with pink, in June and July. I found it in 1861 in abundance in Iceland, where it is called Alptar colavr, the swan's clapper. About Hamburg it is known as the "flower of liberty," and grows only within the domains of the old Hanseatic Republic. In Iceland it serves a double purpose. Its thickly interwoven roots are cut and employed in square pieces like turf or felt as a protection for the backs of horses that are laden with packs. Moreover, in crossing a bog, the clever native ponies always know that they can tread safely where they see the white flower stand aloft.
  • 67. The golden asphodel is common, and remarkably lovely, with its shades of yellow from the deep-tinted buds to the paler expanded flower. The sundew is everywhere that water lodges; the sweet gale has foliage of a pale yellowish green sprinkled over with dots, which are resinous glands. The berries also are sprinkled with the same glands. The plant has a powerful, but fresh and pleasant, odour, which insects dislike. Country people were wont to use sprigs of it, like lavender, to put with their linen, and to hang boughs above their beds. The catkins yield a quantity of wax. The sweet gale was formerly much more abundant, and was largely employed; it went by the name of the Devonshire myrtle. When boiled, the wax rises to the surface of the water. Tapers were made of it, and were so fragrant while burning, that they were employed in sick-rooms. In Prussia, at one time, they were constantly furnished for the royal household. The marsh helleborine, Epipactis palustris, may be gathered, and the pyramidal orchis, and butterfly and frog orchises, occasionally. The furze—only out of bloom when Love is out of tune—keeps away from the standing water. It is the furze which is the glory of the moor, with its dazzling gold and its honey breath, fighting for existence against the farmer who fires it every year, and envelops Dartmoor in a cloud of smoke from March to June. Why should he do this instead of employing the young shoots as fodder? I think that as Scotland has the thistle, Ireland the shamrock, and Wales the leek as their emblems, we Western men of Devon and Cornwall should adopt the furze. If we want a day, there is that of our apostle S. Petrock, on June 4th. By the streams and rivers and on hedge-banks the yellow broom blazes, yet it cannot rival in intensity of colour and in variety of tint the magnificent furze or gorse. But the latter is not a pleasant plant to walk amidst, owing to its prickles, and especial care must be observed lest it affix one of these in the knee. The spike rapidly works inwards and produces intense pain and lameness. The moment it is felt to be there, the thing to be done is immediately to
  • 68. extract it with a knife. From the blossoms of the furze the bees derive their aromatic honey, which makes that of Dartmoor supreme. Yet beekeeping is a difficulty there, owing to the gales, that sweep the busy insects away, so that they fail to find their direction home. Only in sheltered combes can they be kept. The much-relished Swiss honey is a manufactured product of glycerine and pear-juice; but Dartmoor honey is the sublimated essence of ambrosial sweetness in taste and savour, drawn from no other source than the chalices of the golden furze, and compounded with no adventitious matter. FOOTNOTES: [1] "Dartmoor," in the Transactions of the Plymouth Institution, 1897-8.
  • 70. S CHAPTER II. TORS Dartmoor from a distance—Elevation—The tors—Old lake-beds —"Clitters"—The boldest tors—Luminous moss—The whortleberry—Composition of granite—Wolfram—The "forest" and its surrounding commons—Venville parishes— Encroachment of culture on the moor—The four quarters—A drift—Attempts to reclaim the moor—Flint finds—The inclosing of commons. een from a distance, as for instance from Winkleigh churchyard, or from Exbourne, Dartmoor presents a stately appearance, as a ridge of blue mountains rising boldly against the sky out of rolling, richly wooded under- land. But it is only from the north and north-west that it shows so well. From south and east it has less dignity of aspect, as the middle distance is made up of hills, as also because the heights of the encircling tors are not so considerable, nor is their outline so bold. Indeed, the southern edge of Dartmoor is conspicuously tame. It has no abrupt and rugged heights, no chasms cleft and yawning in the range, such as those of the Okement and the Tavy and Taw. And to the east much high ground is found rising in stages to the fringe of the heather-clothed tors.
  • 71. A TOR, SHOWING WEATHERING OF GRANITE Dartmoor, consisting mainly of a great upheaved mass of granite, and of a margin of strata that have been tilted up round it, forms an elevated region some thirty-two miles from north to south and twenty from east to west. The heated granite has altered the slates in contact with it, and is itself broken through on the west side by an upward gush of molten matter which has formed Whit Tor and Brent Tor. The greatest elevations are reached on the outskirts, and there, also, is the finest scenery. The interior consists of rolling upland. It has been likened to a sea after a storm suddenly arrested and turned to stone; but a still better resemblance, if not so romantic, is that of a dust-sheet thrown over the dining-room chairs, the backs of which resemble the tors divided from one another by easy sweeps of turf. Most of the heights are crowned with masses of rock standing up like old castles; these, and these only, are tors.[2] Such are the worn-down stumps of vast masses of mountain formation that have disappeared. There are no lakes on or about the moor, but this was not always so. Where is now Bovey Heathfield was once a noble sheet of water fifty fathoms deep. Here have
  • 72. been found beds of lignite, forests that have been overwhelmed by the wash from the moor, a canoe rudely hollowed out of an oak, and a curious wooden idol was exhumed leaning against a trunk of tree that had been swallowed up in a freshet. The canoe was nine feet long. Bronze spear-heads have also been found in this ancient lake, and moulds for casting bronze instruments. A representation of the idol was given in the Transactions of the Devonshire Association for 1875. The new Plymouth Reservoir overlies an old lake-bed. Taw Marsh was also once a sheet of rippling blue water, but the detritus brought down in the weathering of what once were real mountains has filled them all up. Dartmoor at present bears the same relation to Dartmoor in the far past that the gums of an old hag bear to the pearly range she wore when a fresh girl. The granite of Dartmoor was not well stirred before it was turned out, consequently it is not homogeneous. Granite is made up of many materials: hornblende, feldspar, quartz, mica, schorl, etc. Sometimes we find white mica, sometimes black. Some granite is red, as at Trowlesworthy, and the beautiful band that crosses the Tavy at the Cleave; sometimes pink, as at Leather Tor; sometimes greenish, as above Okery Bridge; sometimes pure white, as at Mill Tor. The granite is of very various consistency, and this has given it an appearance on the tors as if it were a sedimentary rock laid in beds. But this is its little joke to impose on the ignorant. The feature is due to the unequal hardness of the rock which causes it to weather in strata. The fine-grained granite that occurs in dykes is called elvan, which, if easiest to work, is most liable to decay. In Cornwall the elvan of Pentewan was used for the fine church of S. Austell, and as a consequence the weather has gnawed it away, and the greater part has had to be renewed. On the other hand, the splendid elvan of Haute Vienne has supplied the cathedral of Limoges with a fine-grained material that has been carved like lace, and lasts well. The drift that swept over the land would appear to have been from west to east, with a trend to the south, as no granite has been transported, except in the river-beds to the north or west, whereas blocks have been conveyed eastward. This is in accordance with what is shown by the long ridges of clay on the west of Dartmoor, formed of the rubbing down of the slaty rocks that lie north and north-west. These bands all run north and south on the sides of hills, and in draining processes they have to be pierced from east to west. This indicates that at some period during the Glacial Age there was a wash of water from the north-west over Devon, depositing clay and transporting granite.
  • 73. On the sides of the tors are what are locally termed "clitters" or "clatters" (Welsh clechr), consisting of a vast quantity of stone strewn in streams from the tors, spreading out fanlike on the slopes. These are the wreckage of the tor when far higher than it is now, i.e. of the harder portions that have not been dissolved and swept away. "The tors—Nature's towers—are huge masses of granite on the top of the hills, which are not high enough to be called mountains, piled one upon another in Nature's own fantastic way. There may be a tor, or a group of tors, crowning an eminence, but the effect, either near or afar, is to give the hilltop a grand and imposing look. These large blocks of granite, poised on one another, some appearing as if they must fall, others piled with curious regularity —considering they are Nature's work—are the prominent features in a Dartmoor landscape, and, wild as parts of Dartmoor are, the tors add a notable picturesque effect to the scene. There are very fine tors on the western side of the moor. Those on the east and south are not so fine as those on the north and west. In the centre of the moor there are also fine tors. They are, in fact, very numerous, for nearly every little hill has its granite cap, which is a tor, and every tor has its name. Some of the high hills that are tor- less are called beacons, and were doubtless used as signal beacons in times gone by. As the tors are not grouped or built with any design by Nature to attract the eye of man, they are the more attractive on that account, and one of their consequent peculiarities is that from different points of view they never appear the same. There can be no sameness in a landscape of tors when every tor changes its features according to the point of view from which you look at it. Every tor also has its heap of rock at its feet, some of them very striking jumbles of blocks of granite scattered in great confusion between the tor and the foot of the hill. Fur Tor, which is in the very wildest spot on Dartmoor, and is one of the leading tors, has a clitter of rocks on its western side as remarkable as the tor itself; Mis Tor, also on its western side, has a very fine clitter of granite; Leather Tor stands on the top of a mass of granite rocks on its east and south sides; and Hen Tor, on the south quarter, is surrounded with blocks of granite, with a hollow like the crater of a volcano, as if they had been thrown up by a great convulsion of Nature. Hen Tor is remarkable chiefly for this wonderful mass of granite blocks strewn around it. All the
  • 74. moor has granite boulders scattered about, but they accumulate at the feet of the tors as if for their support."[3] VIXEN TOR Here among the clitters, where they form caves, a search may be made for the beautiful moss Schistostega osmundacea. It has a metallic lustre like green gold, and on entering a dark place under rocks, the ground seems to be blazing with gold. In Germany the Fichtel Gebirge are of granite, and the Luchsen Berg is so called because there in the hollow under the rocks grew abundance of the moss glittering like the eyes of a lynx. The authorities of Alexanderbad have had to rail in the grottoes to prevent the gold moss from
  • 75. being carried off by the curious. Murray says of these retreats of the luminous moss:— "The wonder of the place is the beautiful phosphorescence which is seen in the crannies of the rocks, and which appears and disappears according to the position of the spectator. This it is which has given rise to the fairy tales of gold and gems with which the gnomes and cobolds tantalise the poor peasants. The light resembles that of glow-worms; or, if compared to a precious stone, it is something between a chrysolite and a cat's-eye, but shining with a more metallic lustre. On picking up some of it, and bringing it to the light, nothing is found but dirt." Professor Lloyd found that the luminous appearance was due to the presence of small crystals in the structure which reflect the light. Coleridge says:— "'Tis said in Summer's evening hour, Flashes the golden-coloured flower, A fair electric light." In 1843, when the luminosity of plants was recorded in the Proceedings of the British Association, Mr. Babington mentioned having seen in the south of England a peculiar bright appearance produced by the presence of the Schistostega pennata, a little moss which inhabited caverns and dark places: but this was objected to on the ground that the plant reflected light, and did not give it off in phosphorescence.[4] When lighted on, it has the appearance of a handful of emeralds or aqua marine thrown into a dark hole, and is frequently associated with the bright green liverwort. Parfitt, in his Moss Flora of Devon, gives it as osmundacea, not as pennata. It was first discovered in Britain by a Mr. Newberry, on the road from Zeal to South Tawton; it is, however, to be found in a good many places, as Hound Tor, Widdecombe, Leather Tor, and in the Swincombe valley, also in a cave under Lynx Tor. If found, please to leave alone. Gathered it is invisible; the hand or knife brings away only mud. But what all are welcome to go after is that which is abundant on every moorside—but nowhere finer than on such as have not been subjected to periodical "swaling" or burning. I refer to the whortleberry. This delicious fruit, eaten with Devonshire cream, is indeed a delicacy. A gentleman from London was visiting me one day. As he was fond of good things, I gave him whortleberry and cream. He ate it in dead silence, then leaned back in his
  • 76. chair, looked at me with eyes full of feeling, and said, "I am thankful that I have lived to this day." The whortleberry is a good deal used in the south of France for the adulteration and colouring of claret, whole truck-loads being imported from Germany. There is an interesting usage in my parish, and I presume the same exists in others. On one day in summer, when the "whorts" are ripe, the mothers unite to hire waggons of the farmers, or borrow them, and go forth with their little ones to the moor. They spend the day gathering the berries, and light their fires, form their camp, and have their meals together, returning late in the evening, very sunburnt, with very purple mouths, very tired maybe, but vastly happy, and with sufficient fruit to sell to pay all expenses and leave something over. If the reader would know what minerals are found on Dartmoor he must go elsewhere. I have a list before me that begins thus: "Allophane, actinolite, achroite, andalusite, apatite"—but I can copy out no more. I have often found appetite on Dartmoor, but have not the slightest suspicion as to what is apatite. The list winds up with wolfram, about which I can say something. Wolfram is a mineral very generally found along with tin, and that is just the "cussedness" of it, for it spoils tin. When tin ore is melted at a good peat fire, out runs a silver streak of metal. This is brittle as glass, because of the wolfram in it. To get rid of the wolfram the whole has to be roasted, and the operation is delicate, and must have bothered our forefathers considerably. By means of this second process the wolfram, or tungsten as it is also called, is got rid of. Now, it is a curious fact that the tin of Dartmoor is of extraordinary purity; it has little or none of this abominable wolfram associated with it, so that it is by no means improbable that the value of tin as a metal was discovered on Dartmoor, or in some as yet unknown region where it is equally unalloyed. In Cornwall all the tin is mixed with tungsten. Now this material has been hitherto regarded as worthless; it has been sworn at by successive generations of miners since mining first began. But all at once it has leaped into importance, for it has been discovered to possess a remarkable property of hardening iron, and is now largely employed for armour-plated vessels. From being worth nothing it has risen to a rapidly rising value, as we are
  • 77. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com