SlideShare a Scribd company logo
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
Optimizing Transformation for Linearity between Online 
Software Repository Variables. 
Ogunyinka, Peter I1*and Badmus, Nofiu Idowu2 
1. Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Nigeria 
2. Department of Statistics, Abraham Adesanya Polytechnic, Ijebu-Igbo, Nigeria 
* E-mail of the corresponding author: pixelgoldprod@yahoo.com 
Abstract 
Online Software Repositories (OSR) like sourceforge.net and google code contain a wealth of valuable data 
about software projects but these data violate the linearity and normal assumptions, hence making the data 
impossible for use in most statistical data analysis. To prepare these data for statistical data analysis, the data 
were non-linearly transformed, hence, these research established the best twelve (12) transformed model that 
obey linearity assumptions, higher coefficient of determination (푅2), positive and negative relationship and 
gained variable significance over the original data. Similarly, the back transformation or interpretation was 
provided about each of these twelve (12) best ranked linear models to solve the challenges of data transformation 
encountered by researchers. 
Keywords: Data transformation, Linear regression model, OkikiSoft, Online Software Repository and 
Sourceforge.net. 
1. Introduction 
Online Software repository (OSR) is a web based storage of Computer software. OSR contains a wealth of 
valuable information about software projects. Ahmed (2008) gave types of software repository as historical 
repositories, run-time repositories and Code repositories. This research focuses on the code repositories (CR). 
CR, such as www.sourceforge.net and google code (www.code.google.com), host the source codes of various 
applications developed by several developers. According to Ahmed (2008), very often, data available on OSR 
exhibit large amount of noise and skew. The use of such data may lead to incorrect results and conclusions. 
Ahmed (2008) recommends that software repository researchers should closely study the noise and skew in the 
data and better understand the effect on the analysis. Statistical visualization is essential to spot the noise and 
skewness. He concluded in his recommendation that OSR researchers should provide guidelines and tools to 
improve the quality of repository data. This research uses mining software repository (MSR) software called 
OKIKISOFT. Okikisoft is the authors developed artificial intelligence software for automatic mining of data on 
the webpages of sourceforge.net. The statistical analysis of the data mined on the repository revealed the 
violation of linearity assumption between the repository variables. This violation of statistical assumption can 
lead to type-I or type-II error, hence, calling for data transformation for the improvement of the quality of the 
repository data for subsequent analysis. 
Why Sourceforge.net? 
Wikipedia (2014) established sourceforge.net as the first free web based source code repository that hosts free 
and open source software. Among its competitors are Github (www.github.com), Google Code 
(www.code.google.com) and Javaforge (www.javaforge.com). Alexa (2014) rates sourceforge as 162nd world 
best website and the first best repository among the aforementioned competitors. These characteristics have 
called the attention of this research to take a study of the repository for the hundreds of researchers and millions 
of visitors visiting the website. 
2. Methodology 
Regression analysis requires the satisfaction of linearity, normality, homoscedasticity and independence. 
Osborne (2002) established that the violation of the conditions can increase the probability of committing type-I 
or type-II error. Over decades, data transformation has been recommended as the solution to non-linearity, 
outliers, among others. Data transformation involves using a mathematical operation to change the measurement 
scale of variable(s). Linear and nonlinear data transformations are the types of data transformation available. 
Linear transformation retains the relationship between variables while non-linear transformation changes 
(increases or decreases) the integrity of the relationship between variables. Tabanick and Fidell (2007) 
205
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
acknowledging the importance of transformation stated that it may not be generally acceptable by authors as a 
results of difficulty of interpreting the transformed variables but remains a legitimate statistical tool for the 
realization of linear assumption. This research intends to apply transformation technique to online software 
repository variables to establish relationship between these variables which was found absent in the original data. 
2.1 Linear Regression and Data Transformation 
The interpretation of data based on the analysis of variance (ANOVA) is valid if the assumptions of normality, 
homogeneity, independence and addition assumptions are satisfied. Similarly, regression analysis acknowledges 
linearity and the first three aforementioned assumptions for implementation. O’Hara and Hotze (2010) emphases 
that the main purpose of data transformation is to get a sample data to conform with the assumptions of 
parametric statistics such as ANOVA, t-test and linear regression or to manage outliers in a dataset. Marija (2004) 
established that data transformation technique is neither a cheating technique nor distortion of the true picture of 
the data under consideration, rather, it is a legitimate statistical tool. Literatures have established that results 
interpretation of transformed data analysis is the major challenge facing this statistical technique. However, one 
added benefit about most transformation technique is that when data are transformed to meet a certain 
assumption, we often come closer to satisfy another assumptions as well. For instance, square root 
transformation may help to equate group variances by compressing the upper of the distribution end more than it 
compresses the lower end. It may also have effect of making a positively skewed distribution more nearly 
normal in shape. Howel (2007) recommended, as a solution to interpretation problem of data transformation, that 
researchers should look at both the transformed and original data means and make sure that they are telling the 
same basic story. Table 1 presents the common ways to transform variables to achieve literatures for regression 
analysis. 
Table 1: The common statistical transformation techniques 
SN Method Transformed Variable Regression 
206 
Equation 
Predicted/Back 
transformation value (풀̂ 
) 
01 Standard linear 
regression 
None 
푌 = 푏0 + 푏1푋 
푌̂ 
= 푏0 + 푏1푋 
02 Exponential 
transformation 
Dependent variable 
(푙표푔10푌) 
푙표푔10푌 = 푏0 + 푏1푋 푌̂ 
= 10(푏0+푏1푋) 
03 Quadratic 
transformation 
Dependent variable 
(푆푞푟푡(푌)) 
푆푞푟푡(푌) = 푏0 + 푏1푋 푌̂ 
= (푏0 + 푏1푋)2 
04 Reciprocal 
transformation 
Dependent variable 
(푦−1) 
푦−1 = 푏0 + 푏1푋 푌̂ 
= 1⁄(푏0 + 푏1푋) 
05 Logarithm 
transformation 
Independent variable 
(푙표푔10푋) 
푌 = 푏0 + 푏1푙표푔10푋 푌̂ 
= 푏0 + 푏1푙표푔10푋 
06 Power 
transformation 
Dependent variable 푙표푔10푌 
and independent variable 
푙표푔10푋 
푙표푔10푌 
= 푏0 + 푏1푙표푔10푋 
푌̂ 
= 10(푏0+푏1푙표푔10푋) 
SN Method Transformed Variable Regression 
Equation 
Predicted/Back 
transformation value (풀̂ 
) 
07 Square 
transformation 
Independent variable (푋2) 푌 = 푏0 + 푏1푋2 푌̂ 
= 푏0 + 푏1푋2 
In practise, these methods need to be tested on the data to which they are applied for the confirmation that they 
increase rather than decrease the linearity strength of the relationship. Among the methods to detect the 
efficiency of the transformed data are to establish linearity, obtain the coefficient of determination (푅2) and to 
conduct a significant test of the independent variable on the response variable. It is expected that 푅2 of the 
transformed variables will be higher than the non-transformed variables and that the independent variables will 
be significant to the response variable. Back transformation is used to return a transformed predicted value to its 
original scale. Back transformation predicted values give values for the medium response but not the mean 
response as it is expected. Miller (1984) established that back transformation on the mean of the dependent 
variable results to serious bias. He, further, established a solution that minimizes the bias. Jia and Rathi (2008),
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
confirming the bias, established a more efficient solution that almost removed the bias. Researchers are advised 
to consult this aforementioned literature for implementation. 
2.2 Mining of data 
Visitors to sourceforge have the privilege to find, create, publish, rate and download free and open source 
software from the repository. Rating of software is done based on two criteria viz. Stars rating and laid down 
criteria rating. The user can rate 5,4,3,2 or 1 star(s) after which the average stars rating is computed automatically. 
Computation of the average rating is shown in table 2 below. 
Table 2: Average rating computation for filezilla on sourceforge.net as at April 26, 2014. 
207 
(a) 
Rating Star 
(b) 
Number of Raters 
(c=a*b) 
Total 
5 843 4215 
4 11 44 
3 6 18 
2 4 8 
1 113 113 
Total 977 4398 
Average ( 
ퟓ풊 
Σ 풄풊 
=ퟏ 
ퟓ풊 
Σ 풃풊 
=ퟏ 
) 4.5 
Similarly, laid-down criteria including design, ease, feature and support are considered in rating software by the 
user. The repository takes account of total number of people that rate software and download software on it. 
Perhaps, knowing to the visitors, the repository system collects some information about the users’ operating 
system maker and users’ country locations. 
Manually collecting data on sourceforge can be a tedious task spanning through days and weeks depending on 
the volume of the data under concern. This research uses mining software repository (MSR) software named 
Okikisoft which was specially developed for this research purpose. Okikisoft automatically and invisibly visits 
the pages of sourceforge to extract the required data. It compiles the mined data into CSV files and save it in the 
application folder. Okikisoft can act as a server-side or client-side mining system. 
3. Presentation and Analysis of the Mined Data 
The extracted data variables are represented as follows: D represents Download total for software, F represents 
Filesize of the software , R represents Average Rating for the software and T represents Total number of visitors 
that rate the software. Okikisoft mined 1802 software data ranging between February 1 through February 28, 
2014. However, a sample size of 푛 = 50 was randomly selected for this research. The scatter plots matrix of the 
original data is presented in figure 1.
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
1 푅 푇 푅 = 훼 + 훽푇 + 0.03% 푃 = 0.976: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
2 푅 퐹 푅 = 훼 + 훽퐹 + 0.02% 푃 = 0.879: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
3 푅 퐷 푅 = 훼 + 훽퐷 - 0.02% 푃 = 0.880: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
4 푇 퐹 푇 = 훼 + 훽퐹 + 0.0% 푃 = 0.992: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 
5 (*) 푇 퐷 푇 = 훼 + 훽퐷 - 23.6% 푃 = 0.000: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 
6 (*) 퐹 퐷 퐹 = 훼 + 훽퐷 - 21.4% 푃 = 0.001: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 
208 
Figure 1: Scatter plot matrix of the original data. 
Figure 1 reveals that none of the plots in the scatter plots matrix can be assumed to be linear. Similarly table 3 
shows the analysis results done on the original data. Very low coefficient of determination (푅2) and 
insignificance of the independent variable were experienced between (푅 and 푇), (푅 and 퐹), (푅 and 퐷) and (푇 
and 퐹) and negative relationship was experienced between (푇 and 퐷) and (퐹 and 퐷). This result coincides 
with Ahmed (2008) that the original online repository data may violate fundamental assumptions required by 
researchers, hence, we recommend that researchers should study the data before further analysis. 
Table 3: Analysis results of original data. 
Sn. 
푽풂풓ퟏ 
= 풚 
푽풂풓ퟐ 
= 풙 
Linear Regression 
Model 
+/ 
- r 
푹ퟐ 
푯ퟎ: 휷 = ퟎ 
휶 = ퟎ. ퟎퟓ 
3.1 Data transformation and analysis 
To prevent values between 0 and 1 in the original data (Osborne (2002)), a linear transformation was done by 
adding 2 to all data in the four variables. The linearly transformed variables were further nonlinearly transformed 
with the seven tools viz: 푙표푔2푦, 푙표푔10푦, √푦, 푦2, √푦 3 , 푦3 and 푦−1. Table 4 shows the results of the data 
transformation and figure 2 shows the scatter plot matrix for the data.
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
-/+ 
r 
1 T 푙표푔10퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 
2 푙표푔10퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 
3 푙표푔10푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
4 √퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 
5 푙표푔2푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
6 √퐷 = 퐷∗ 푇2 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 
7 T √퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 
8 √푇 = 푇∗ √퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
9 푙표푔10푇 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ - - 푇̂ 
10 푇3 = 푇∗ 퐷 푇∗ = 훼 + 훽퐷 ++ 푇̂ 
= √(훼 + 훽퐷) 3 
11 D 푇2 = 푇∗ 퐷 = 훼 + 훽푇∗ ++ 퐷̂ 
= 훼 + 훽푇∗ 29.0% 11th 
12 푇3 = 푇∗ √퐷 3 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
= √(훼 + 훽퐷∗) 3 
13 푙표푔10퐷 = 퐷∗ 푇−1 = 푇∗ 퐷∗ = 훼 + 훽푇∗ - - 퐷̂ 
14 푇−1 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 
= (훼 + 훽퐷∗)−1 13.9% 
209 
Table 4: Analysis results of transformed data. 
Sn 푽풂풓ퟏ = 풚 푽풂풓ퟐ = 풙 
Linear Regre-ssion 
Model 
Back 
Transformation 
푹ퟐ 
= 훼 + 훽퐷∗ 55.7% 1st 
= 10(훼+훽푇∗) 52.2% 2nd 
= 10(훼+훽퐷∗) 46.4% 3rd 
= (훼 + 훽푇∗)2 41.6% 4th 
= 2(훼+훽퐷∗) 41.2% 5th 
= (훼 + 훽푇∗)2 40.6% 6th 
= 훼 + 훽퐷∗ 40.5% 7th 
= (훼 + 훽퐷∗)2 35.5% 8th 
= 10(훼+훽퐷∗) 31.1% 9th 
31.0% 10th 
27.1% 12th 
= 10(훼+훽푇∗) 17.5% 
Figure 2: Scatter plot matrix of the first 3 rated models. 
Ran-king 
4. Discussion of findings 
Linearity or almost linearity was ascertained between transformed variables. Table 4 shows 푅2 values, linear 
regression models and the back transformation or interpretation between the respective variables that proved to 
have linear relationship after transformation was successfully executed. Similarly, twelve (12) models out of the 
fourteen transformed linear regression models claim to have positive relationship between variables while the 
independent variables proved to be significant (at 5% significant level) to the corresponding dependent variable. 
These transformations only discovered linearity between T and D variables while relationship between other 
variables failed to claim linearity. We hope attention will be focused on this in future research. Since the result in 
table 4 are linear and significant, hence, ranking of the result was done using the values of the 푅2. Column 7 of 
table 4 shows the ranking result. The first 12 ranked models proved better with 푅2 > 23.6% which was the 
highest 푅2 value obtained between T and D in table 3. Fig. 2 shows the scatter plot matrix of the first 3 ranked 
models. This research only uses 푦 to represent the dependent variable and 푥 for the independent variable in 
table 4, it is important for researchers to note that these variables can be interchanged but with consequences on 
the back transformation of the model. To prevent the aforementioned problem of bias (Miller (1984)) from back 
transformation on the dependent variables, we recommend models 1, 7 and 11 since they do not include the 
transformation of the dependent variable.
Mathematical Theory and Modeling www.iiste.org 
ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) 
Vol.4, No.9, 2014 
5. Conclusion 
In this research, we have used data transformation statistical tools in preparing mined data from online software 
repository, a case study of sourceforge.net, for researchers for subsequent analysis in improving the efficiency of 
repositories. It was established that data from online repositories disobey linearity assumptions and may not be 
significant as required for regression analysis. Hence, we recommend that researchers should study the mined 
data from online repositories before utilizing them for analysis. Combination of non-linear transformation tools 
proved to be effective in establishing linearity between repository variables. It was also established that after 
transformation, only Download total and total number of visitors that rate software on sourceforge can be linear. 
This research has provided list, in rank, of the first best 12 linear regression models for the direct use of 
researchers in future analysis. 
References 
Ahmed, E. Hassan (2008), “A road ahead for mining Software repositories”, IEEE. 
Doi:10.1109/FOSM.2008.4659248, Pp. 48-57. 
Alexa (2014) www.alexa.com. Retrieved on April 20, 2014. 
Cleveland, W.S. (1984), “Graphical methods for data presentation. Full Scale breaks dot charts multibased 
logging”, The American Statistician, Vol.38(4), Pp. 270-280. 
Howel, D.C. (2007), “Statistical methods for psychology” Belmont, C.A. Thomson Wadsworth. 6th Edition. 
Jia, Siwei and Rathi, Sarika (2008), “On predicting log-transformed linear models with heteroscedaticity”, SAS 
Global Forum, Paper 370-2008. 
Manikandan, S. (2010), “Data transformation”, J. Pharmacol Pharmocother. Jul.-Dec, Vol.1(2), 
doi:10.4103/0976-500x.72373, Pp126-127. 
Marija, J. Norusis (2004), “SPSS 12.0 Guide to Data Analysis”, Prentice hall Inc., ISBN 0-13-147886-9. 
Miller, D. (1984), “Reducing transformation bias in Curve fitting”, The American Statistician, 30(2), 124-126. 
O’Hara, Robert B. And Hotze, Johan D. (2010), “Do not log-transform Count data”, Methods in Ecology and 
Evolution, Doi:10.1111/j.2041-210x.2010.00021.x. 
Osborne, Jason (2002), “Notes on the use of data transformations”, Practical Assessment, Research and 
Evaluation. Vol.8(6). 
Tabacknick, B.G. and Fidell, L.S. (2007), “Using Multivariate Statistics”, 5th Edition, Baston, Allyn and Bacon. 
Turke,y J.W. (1977), “Exploratory data analysis”, Reading M.A, Addison-Wesley. 
Wikipedia (2014), Sourceforge, www.en.wikipedia.org/wiki/sourceforge, Retrieved on April 20, 2014. 
www.sourceforge.net, Mining Software (Okikisoft) retrieved on March 13, 2014. 
210
The IISTE is a pioneer in the Open-Access hosting service and academic event 
management. The aim of the firm is Accelerating Global Knowledge Sharing. 
More information about the firm can be found on the homepage: 
http://www.iiste.org 
CALL FOR JOURNAL PAPERS 
There are more than 30 peer-reviewed academic journals hosted under the hosting 
platform. 
Prospective authors of journals can find the submission instruction on the 
following page: http://www.iiste.org/journals/ All the journals articles are available 
online to the readers all over the world without financial, legal, or technical barriers 
other than those inseparable from gaining access to the internet itself. Paper version 
of the journals is also available upon request of readers and authors. 
MORE RESOURCES 
Book publication information: http://www.iiste.org/book/ 
IISTE Knowledge Sharing Partners 
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open 
Archives Harvester, Bielefeld Academic Search Engine, Elektronische 
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial 
Library , NewJour, Google Scholar

More Related Content

PDF
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
PDF
Data Imputation by Soft Computing
PDF
IRJET- Missing Data Imputation by Evidence Chain
PDF
20 26 jan17 walter latex
PDF
Effective data mining for proper
PDF
TECHNICAL REVIEW: PERFORMANCE OF EXISTING IMPUTATION METHODS FOR MISSING DATA...
PDF
Particle Swarm Optimization based K-Prototype Clustering Algorithm
PDF
Enhance The Technique For Searching Dimension Incomplete Databases
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
Data Imputation by Soft Computing
IRJET- Missing Data Imputation by Evidence Chain
20 26 jan17 walter latex
Effective data mining for proper
TECHNICAL REVIEW: PERFORMANCE OF EXISTING IMPUTATION METHODS FOR MISSING DATA...
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Enhance The Technique For Searching Dimension Incomplete Databases

What's hot (17)

PDF
An effective adaptive approach for joining data in data
PDF
Survey on Feature Selection and Dimensionality Reduction Techniques
PDF
Enhancement techniques for data warehouse staging area
PDF
The pertinent single-attribute-based classifier for small datasets classific...
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
PDF
Application of data mining tools for
PDF
GCUBE INDEXING
PPTX
High Dimensional Biological Data Analysis and Visualization
PDF
M033059064
PDF
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
PDF
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
PDF
A statistical data fusion technique in virtual data integration environment
PDF
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
PDF
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
PDF
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
PPT
Strategies for Metabolomics Data Analysis
PDF
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
An effective adaptive approach for joining data in data
Survey on Feature Selection and Dimensionality Reduction Techniques
Enhancement techniques for data warehouse staging area
The pertinent single-attribute-based classifier for small datasets classific...
Metabolomic Data Analysis Workshop and Tutorials (2014)
Application of data mining tools for
GCUBE INDEXING
High Dimensional Biological Data Analysis and Visualization
M033059064
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A statistical data fusion technique in virtual data integration environment
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
Strategies for Metabolomics Data Analysis
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
Ad

Similar to Optimizing transformation for linearity between online (20)

PPTX
Matlab: Regression
PPTX
Matlab:Regression
DOC
Cyb 5675 class project final
PDF
Machine Learning.pdf
DOCX
QSO 510 Final Project Case Addendum Vice-president Arun.docx
PDF
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
PPT
Computing Transformations Spring2005
PPT
Computingtransformations Spring2005
PDF
03-Data-Analysis-Final.pdf
PDF
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
PDF
Building Azure Machine Learning Models
PPT
Kevin Swingler: Introduction to Data Mining
PDF
Ibm spss statistics 19 brief guide
DOC
Open06
PDF
Preliminary Modeling Report
PDF
Regression, theil’s and mlp forecasting models of stock index
PDF
Regression, theil’s and mlp forecasting models of stock index
PDF
Regression, theil’s and mlp forecasting models of stock index
PDF
Data mining 2012 generalwithmethods
DOCX
MAT 510 Entire Course NEW
Matlab: Regression
Matlab:Regression
Cyb 5675 class project final
Machine Learning.pdf
QSO 510 Final Project Case Addendum Vice-president Arun.docx
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Computing Transformations Spring2005
Computingtransformations Spring2005
03-Data-Analysis-Final.pdf
Analytics Types.pdfdvf ifbvuibugdfiubuibubufdibhdfiubfduibhfiuvdih
Building Azure Machine Learning Models
Kevin Swingler: Introduction to Data Mining
Ibm spss statistics 19 brief guide
Open06
Preliminary Modeling Report
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock index
Data mining 2012 generalwithmethods
MAT 510 Entire Course NEW
Ad

More from Alexander Decker (20)

PDF
Abnormalities of hormones and inflammatory cytokines in women affected with p...
PDF
A validation of the adverse childhood experiences scale in
PDF
A usability evaluation framework for b2 c e commerce websites
PDF
A universal model for managing the marketing executives in nigerian banks
PDF
A unique common fixed point theorems in generalized d
PDF
A trends of salmonella and antibiotic resistance
PDF
A transformational generative approach towards understanding al-istifham
PDF
A time series analysis of the determinants of savings in namibia
PDF
A therapy for physical and mental fitness of school children
PDF
A theory of efficiency for managing the marketing executives in nigerian banks
PDF
A systematic evaluation of link budget for
PDF
A synthetic review of contraceptive supplies in punjab
PDF
A synthesis of taylor’s and fayol’s management approaches for managing market...
PDF
A survey paper on sequence pattern mining with incremental
PDF
A survey on live virtual machine migrations and its techniques
PDF
A survey on data mining and analysis in hadoop and mongo db
PDF
A survey on challenges to the media cloud
PDF
A survey of provenance leveraged
PDF
A survey of private equity investments in kenya
PDF
A study to measures the financial health of
Abnormalities of hormones and inflammatory cytokines in women affected with p...
A validation of the adverse childhood experiences scale in
A usability evaluation framework for b2 c e commerce websites
A universal model for managing the marketing executives in nigerian banks
A unique common fixed point theorems in generalized d
A trends of salmonella and antibiotic resistance
A transformational generative approach towards understanding al-istifham
A time series analysis of the determinants of savings in namibia
A therapy for physical and mental fitness of school children
A theory of efficiency for managing the marketing executives in nigerian banks
A systematic evaluation of link budget for
A synthetic review of contraceptive supplies in punjab
A synthesis of taylor’s and fayol’s management approaches for managing market...
A survey paper on sequence pattern mining with incremental
A survey on live virtual machine migrations and its techniques
A survey on data mining and analysis in hadoop and mongo db
A survey on challenges to the media cloud
A survey of provenance leveraged
A survey of private equity investments in kenya
A study to measures the financial health of

Recently uploaded (20)

PDF
Laughter Yoga Basic Learning Workshop Manual
PPTX
2025 Product Deck V1.0.pptxCATALOGTCLCIA
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PPTX
Amazon (Business Studies) management studies
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
A Brief Introduction About Julia Allison
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
DOCX
Euro SEO Services 1st 3 General Updates.docx
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
Tata consultancy services case study shri Sharda college, basrur
Laughter Yoga Basic Learning Workshop Manual
2025 Product Deck V1.0.pptxCATALOGTCLCIA
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
unit 1 COST ACCOUNTING AND COST SHEET
Ôn tập tiếng anh trong kinh doanh nâng cao
Power and position in leadershipDOC-20250808-WA0011..pdf
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Reconciliation AND MEMORANDUM RECONCILATION
Amazon (Business Studies) management studies
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
Belch_12e_PPT_Ch18_Accessible_university.pptx
A Brief Introduction About Julia Allison
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Roadmap Map-digital Banking feature MB,IB,AB
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
Euro SEO Services 1st 3 General Updates.docx
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Tata consultancy services case study shri Sharda college, basrur

Optimizing transformation for linearity between online

  • 1. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 Optimizing Transformation for Linearity between Online Software Repository Variables. Ogunyinka, Peter I1*and Badmus, Nofiu Idowu2 1. Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Nigeria 2. Department of Statistics, Abraham Adesanya Polytechnic, Ijebu-Igbo, Nigeria * E-mail of the corresponding author: pixelgoldprod@yahoo.com Abstract Online Software Repositories (OSR) like sourceforge.net and google code contain a wealth of valuable data about software projects but these data violate the linearity and normal assumptions, hence making the data impossible for use in most statistical data analysis. To prepare these data for statistical data analysis, the data were non-linearly transformed, hence, these research established the best twelve (12) transformed model that obey linearity assumptions, higher coefficient of determination (푅2), positive and negative relationship and gained variable significance over the original data. Similarly, the back transformation or interpretation was provided about each of these twelve (12) best ranked linear models to solve the challenges of data transformation encountered by researchers. Keywords: Data transformation, Linear regression model, OkikiSoft, Online Software Repository and Sourceforge.net. 1. Introduction Online Software repository (OSR) is a web based storage of Computer software. OSR contains a wealth of valuable information about software projects. Ahmed (2008) gave types of software repository as historical repositories, run-time repositories and Code repositories. This research focuses on the code repositories (CR). CR, such as www.sourceforge.net and google code (www.code.google.com), host the source codes of various applications developed by several developers. According to Ahmed (2008), very often, data available on OSR exhibit large amount of noise and skew. The use of such data may lead to incorrect results and conclusions. Ahmed (2008) recommends that software repository researchers should closely study the noise and skew in the data and better understand the effect on the analysis. Statistical visualization is essential to spot the noise and skewness. He concluded in his recommendation that OSR researchers should provide guidelines and tools to improve the quality of repository data. This research uses mining software repository (MSR) software called OKIKISOFT. Okikisoft is the authors developed artificial intelligence software for automatic mining of data on the webpages of sourceforge.net. The statistical analysis of the data mined on the repository revealed the violation of linearity assumption between the repository variables. This violation of statistical assumption can lead to type-I or type-II error, hence, calling for data transformation for the improvement of the quality of the repository data for subsequent analysis. Why Sourceforge.net? Wikipedia (2014) established sourceforge.net as the first free web based source code repository that hosts free and open source software. Among its competitors are Github (www.github.com), Google Code (www.code.google.com) and Javaforge (www.javaforge.com). Alexa (2014) rates sourceforge as 162nd world best website and the first best repository among the aforementioned competitors. These characteristics have called the attention of this research to take a study of the repository for the hundreds of researchers and millions of visitors visiting the website. 2. Methodology Regression analysis requires the satisfaction of linearity, normality, homoscedasticity and independence. Osborne (2002) established that the violation of the conditions can increase the probability of committing type-I or type-II error. Over decades, data transformation has been recommended as the solution to non-linearity, outliers, among others. Data transformation involves using a mathematical operation to change the measurement scale of variable(s). Linear and nonlinear data transformations are the types of data transformation available. Linear transformation retains the relationship between variables while non-linear transformation changes (increases or decreases) the integrity of the relationship between variables. Tabanick and Fidell (2007) 205
  • 2. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 acknowledging the importance of transformation stated that it may not be generally acceptable by authors as a results of difficulty of interpreting the transformed variables but remains a legitimate statistical tool for the realization of linear assumption. This research intends to apply transformation technique to online software repository variables to establish relationship between these variables which was found absent in the original data. 2.1 Linear Regression and Data Transformation The interpretation of data based on the analysis of variance (ANOVA) is valid if the assumptions of normality, homogeneity, independence and addition assumptions are satisfied. Similarly, regression analysis acknowledges linearity and the first three aforementioned assumptions for implementation. O’Hara and Hotze (2010) emphases that the main purpose of data transformation is to get a sample data to conform with the assumptions of parametric statistics such as ANOVA, t-test and linear regression or to manage outliers in a dataset. Marija (2004) established that data transformation technique is neither a cheating technique nor distortion of the true picture of the data under consideration, rather, it is a legitimate statistical tool. Literatures have established that results interpretation of transformed data analysis is the major challenge facing this statistical technique. However, one added benefit about most transformation technique is that when data are transformed to meet a certain assumption, we often come closer to satisfy another assumptions as well. For instance, square root transformation may help to equate group variances by compressing the upper of the distribution end more than it compresses the lower end. It may also have effect of making a positively skewed distribution more nearly normal in shape. Howel (2007) recommended, as a solution to interpretation problem of data transformation, that researchers should look at both the transformed and original data means and make sure that they are telling the same basic story. Table 1 presents the common ways to transform variables to achieve literatures for regression analysis. Table 1: The common statistical transformation techniques SN Method Transformed Variable Regression 206 Equation Predicted/Back transformation value (풀̂ ) 01 Standard linear regression None 푌 = 푏0 + 푏1푋 푌̂ = 푏0 + 푏1푋 02 Exponential transformation Dependent variable (푙표푔10푌) 푙표푔10푌 = 푏0 + 푏1푋 푌̂ = 10(푏0+푏1푋) 03 Quadratic transformation Dependent variable (푆푞푟푡(푌)) 푆푞푟푡(푌) = 푏0 + 푏1푋 푌̂ = (푏0 + 푏1푋)2 04 Reciprocal transformation Dependent variable (푦−1) 푦−1 = 푏0 + 푏1푋 푌̂ = 1⁄(푏0 + 푏1푋) 05 Logarithm transformation Independent variable (푙표푔10푋) 푌 = 푏0 + 푏1푙표푔10푋 푌̂ = 푏0 + 푏1푙표푔10푋 06 Power transformation Dependent variable 푙표푔10푌 and independent variable 푙표푔10푋 푙표푔10푌 = 푏0 + 푏1푙표푔10푋 푌̂ = 10(푏0+푏1푙표푔10푋) SN Method Transformed Variable Regression Equation Predicted/Back transformation value (풀̂ ) 07 Square transformation Independent variable (푋2) 푌 = 푏0 + 푏1푋2 푌̂ = 푏0 + 푏1푋2 In practise, these methods need to be tested on the data to which they are applied for the confirmation that they increase rather than decrease the linearity strength of the relationship. Among the methods to detect the efficiency of the transformed data are to establish linearity, obtain the coefficient of determination (푅2) and to conduct a significant test of the independent variable on the response variable. It is expected that 푅2 of the transformed variables will be higher than the non-transformed variables and that the independent variables will be significant to the response variable. Back transformation is used to return a transformed predicted value to its original scale. Back transformation predicted values give values for the medium response but not the mean response as it is expected. Miller (1984) established that back transformation on the mean of the dependent variable results to serious bias. He, further, established a solution that minimizes the bias. Jia and Rathi (2008),
  • 3. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 confirming the bias, established a more efficient solution that almost removed the bias. Researchers are advised to consult this aforementioned literature for implementation. 2.2 Mining of data Visitors to sourceforge have the privilege to find, create, publish, rate and download free and open source software from the repository. Rating of software is done based on two criteria viz. Stars rating and laid down criteria rating. The user can rate 5,4,3,2 or 1 star(s) after which the average stars rating is computed automatically. Computation of the average rating is shown in table 2 below. Table 2: Average rating computation for filezilla on sourceforge.net as at April 26, 2014. 207 (a) Rating Star (b) Number of Raters (c=a*b) Total 5 843 4215 4 11 44 3 6 18 2 4 8 1 113 113 Total 977 4398 Average ( ퟓ풊 Σ 풄풊 =ퟏ ퟓ풊 Σ 풃풊 =ퟏ ) 4.5 Similarly, laid-down criteria including design, ease, feature and support are considered in rating software by the user. The repository takes account of total number of people that rate software and download software on it. Perhaps, knowing to the visitors, the repository system collects some information about the users’ operating system maker and users’ country locations. Manually collecting data on sourceforge can be a tedious task spanning through days and weeks depending on the volume of the data under concern. This research uses mining software repository (MSR) software named Okikisoft which was specially developed for this research purpose. Okikisoft automatically and invisibly visits the pages of sourceforge to extract the required data. It compiles the mined data into CSV files and save it in the application folder. Okikisoft can act as a server-side or client-side mining system. 3. Presentation and Analysis of the Mined Data The extracted data variables are represented as follows: D represents Download total for software, F represents Filesize of the software , R represents Average Rating for the software and T represents Total number of visitors that rate the software. Okikisoft mined 1802 software data ranging between February 1 through February 28, 2014. However, a sample size of 푛 = 50 was randomly selected for this research. The scatter plots matrix of the original data is presented in figure 1.
  • 4. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 1 푅 푇 푅 = 훼 + 훽푇 + 0.03% 푃 = 0.976: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 2 푅 퐹 푅 = 훼 + 훽퐹 + 0.02% 푃 = 0.879: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 3 푅 퐷 푅 = 훼 + 훽퐷 - 0.02% 푃 = 0.880: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 4 푇 퐹 푇 = 훼 + 훽퐹 + 0.0% 푃 = 0.992: 푥 푖푛푠푖푔푛푖푓푖푐푎푛푡 5 (*) 푇 퐷 푇 = 훼 + 훽퐷 - 23.6% 푃 = 0.000: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 6 (*) 퐹 퐷 퐹 = 훼 + 훽퐷 - 21.4% 푃 = 0.001: 푥 푠푖푔푛푖푓푖푐푎푛푡 (*) 208 Figure 1: Scatter plot matrix of the original data. Figure 1 reveals that none of the plots in the scatter plots matrix can be assumed to be linear. Similarly table 3 shows the analysis results done on the original data. Very low coefficient of determination (푅2) and insignificance of the independent variable were experienced between (푅 and 푇), (푅 and 퐹), (푅 and 퐷) and (푇 and 퐹) and negative relationship was experienced between (푇 and 퐷) and (퐹 and 퐷). This result coincides with Ahmed (2008) that the original online repository data may violate fundamental assumptions required by researchers, hence, we recommend that researchers should study the data before further analysis. Table 3: Analysis results of original data. Sn. 푽풂풓ퟏ = 풚 푽풂풓ퟐ = 풙 Linear Regression Model +/ - r 푹ퟐ 푯ퟎ: 휷 = ퟎ 휶 = ퟎ. ퟎퟓ 3.1 Data transformation and analysis To prevent values between 0 and 1 in the original data (Osborne (2002)), a linear transformation was done by adding 2 to all data in the four variables. The linearly transformed variables were further nonlinearly transformed with the seven tools viz: 푙표푔2푦, 푙표푔10푦, √푦, 푦2, √푦 3 , 푦3 and 푦−1. Table 4 shows the results of the data transformation and figure 2 shows the scatter plot matrix for the data.
  • 5. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 -/+ r 1 T 푙표푔10퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 2 푙표푔10퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 3 푙표푔10푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 4 √퐷 = 퐷∗ √푇 3 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 5 푙표푔2푇 = 푇∗ 푙표푔10퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 6 √퐷 = 퐷∗ 푇2 = 푇∗ 퐷∗ = 훼 + 훽푇∗ ++ 퐷̂ 7 T √퐷 = 퐷∗ 푇 = 훼 + 훽퐷∗ ++ 푇̂ 8 √푇 = 푇∗ √퐷 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ 9 푙표푔10푇 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ - - 푇̂ 10 푇3 = 푇∗ 퐷 푇∗ = 훼 + 훽퐷 ++ 푇̂ = √(훼 + 훽퐷) 3 11 D 푇2 = 푇∗ 퐷 = 훼 + 훽푇∗ ++ 퐷̂ = 훼 + 훽푇∗ 29.0% 11th 12 푇3 = 푇∗ √퐷 3 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ = √(훼 + 훽퐷∗) 3 13 푙표푔10퐷 = 퐷∗ 푇−1 = 푇∗ 퐷∗ = 훼 + 훽푇∗ - - 퐷̂ 14 푇−1 = 푇∗ 퐷−1 = 퐷∗ 푇∗ = 훼 + 훽퐷∗ ++ 푇̂ = (훼 + 훽퐷∗)−1 13.9% 209 Table 4: Analysis results of transformed data. Sn 푽풂풓ퟏ = 풚 푽풂풓ퟐ = 풙 Linear Regre-ssion Model Back Transformation 푹ퟐ = 훼 + 훽퐷∗ 55.7% 1st = 10(훼+훽푇∗) 52.2% 2nd = 10(훼+훽퐷∗) 46.4% 3rd = (훼 + 훽푇∗)2 41.6% 4th = 2(훼+훽퐷∗) 41.2% 5th = (훼 + 훽푇∗)2 40.6% 6th = 훼 + 훽퐷∗ 40.5% 7th = (훼 + 훽퐷∗)2 35.5% 8th = 10(훼+훽퐷∗) 31.1% 9th 31.0% 10th 27.1% 12th = 10(훼+훽푇∗) 17.5% Figure 2: Scatter plot matrix of the first 3 rated models. Ran-king 4. Discussion of findings Linearity or almost linearity was ascertained between transformed variables. Table 4 shows 푅2 values, linear regression models and the back transformation or interpretation between the respective variables that proved to have linear relationship after transformation was successfully executed. Similarly, twelve (12) models out of the fourteen transformed linear regression models claim to have positive relationship between variables while the independent variables proved to be significant (at 5% significant level) to the corresponding dependent variable. These transformations only discovered linearity between T and D variables while relationship between other variables failed to claim linearity. We hope attention will be focused on this in future research. Since the result in table 4 are linear and significant, hence, ranking of the result was done using the values of the 푅2. Column 7 of table 4 shows the ranking result. The first 12 ranked models proved better with 푅2 > 23.6% which was the highest 푅2 value obtained between T and D in table 3. Fig. 2 shows the scatter plot matrix of the first 3 ranked models. This research only uses 푦 to represent the dependent variable and 푥 for the independent variable in table 4, it is important for researchers to note that these variables can be interchanged but with consequences on the back transformation of the model. To prevent the aforementioned problem of bias (Miller (1984)) from back transformation on the dependent variables, we recommend models 1, 7 and 11 since they do not include the transformation of the dependent variable.
  • 6. Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.4, No.9, 2014 5. Conclusion In this research, we have used data transformation statistical tools in preparing mined data from online software repository, a case study of sourceforge.net, for researchers for subsequent analysis in improving the efficiency of repositories. It was established that data from online repositories disobey linearity assumptions and may not be significant as required for regression analysis. Hence, we recommend that researchers should study the mined data from online repositories before utilizing them for analysis. Combination of non-linear transformation tools proved to be effective in establishing linearity between repository variables. It was also established that after transformation, only Download total and total number of visitors that rate software on sourceforge can be linear. This research has provided list, in rank, of the first best 12 linear regression models for the direct use of researchers in future analysis. References Ahmed, E. Hassan (2008), “A road ahead for mining Software repositories”, IEEE. Doi:10.1109/FOSM.2008.4659248, Pp. 48-57. Alexa (2014) www.alexa.com. Retrieved on April 20, 2014. Cleveland, W.S. (1984), “Graphical methods for data presentation. Full Scale breaks dot charts multibased logging”, The American Statistician, Vol.38(4), Pp. 270-280. Howel, D.C. (2007), “Statistical methods for psychology” Belmont, C.A. Thomson Wadsworth. 6th Edition. Jia, Siwei and Rathi, Sarika (2008), “On predicting log-transformed linear models with heteroscedaticity”, SAS Global Forum, Paper 370-2008. Manikandan, S. (2010), “Data transformation”, J. Pharmacol Pharmocother. Jul.-Dec, Vol.1(2), doi:10.4103/0976-500x.72373, Pp126-127. Marija, J. Norusis (2004), “SPSS 12.0 Guide to Data Analysis”, Prentice hall Inc., ISBN 0-13-147886-9. Miller, D. (1984), “Reducing transformation bias in Curve fitting”, The American Statistician, 30(2), 124-126. O’Hara, Robert B. And Hotze, Johan D. (2010), “Do not log-transform Count data”, Methods in Ecology and Evolution, Doi:10.1111/j.2041-210x.2010.00021.x. Osborne, Jason (2002), “Notes on the use of data transformations”, Practical Assessment, Research and Evaluation. Vol.8(6). Tabacknick, B.G. and Fidell, L.S. (2007), “Using Multivariate Statistics”, 5th Edition, Baston, Allyn and Bacon. Turke,y J.W. (1977), “Exploratory data analysis”, Reading M.A, Addison-Wesley. Wikipedia (2014), Sourceforge, www.en.wikipedia.org/wiki/sourceforge, Retrieved on April 20, 2014. www.sourceforge.net, Mining Software (Okikisoft) retrieved on March 13, 2014. 210
  • 7. The IISTE is a pioneer in the Open-Access hosting service and academic event management. The aim of the firm is Accelerating Global Knowledge Sharing. More information about the firm can be found on the homepage: http://www.iiste.org CALL FOR JOURNAL PAPERS There are more than 30 peer-reviewed academic journals hosted under the hosting platform. Prospective authors of journals can find the submission instruction on the following page: http://www.iiste.org/journals/ All the journals articles are available online to the readers all over the world without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Paper version of the journals is also available upon request of readers and authors. MORE RESOURCES Book publication information: http://www.iiste.org/book/ IISTE Knowledge Sharing Partners EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open Archives Harvester, Bielefeld Academic Search Engine, Elektronische Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial Library , NewJour, Google Scholar