Reindex Process
Once an re-index has been triggered a file is created called
u1/agentshop/webstore/process_search.semWhich will allow the cron job to run and process a
shell script calledu1/agentshop/scripts/search/process_search_live.sh , The cron job is set to run
every minute.
Once the process has started the actual re-index doesn’t commence until a few database query have
been processed and the file u1/agentshop/webstore/process_search.semis deleted and replaced
with u1/agentshop/webstore/process_search.in which is a flag to allow any current searches in
progress to identifies that a re-index is in process.
Process stages
1.

The first file that is process by the shell script is a file
calledprocess_search_sort_index.phpwhich truncates the table searh_sort_indexthen
create a temporary table named searh_sort_index_tempthis table is populated by querying
all product in the webcat table with the priority order for product’s. Its purpose is for
displaying products in a specified order AP has set under theproduct groups tab on the CMS.
The search_sort_indextable is only updated when a re-index has been initiated.
Example Rowfrom thesearch_sort_indextable :
sortid
P517

productid
10032

prodno
21791

priority
001-01

miscidtype
P

miscid
517

grouptail
2. process_search_pre.php :prepares for the re-index which grabs product data from three
tables.
webcattable (stores SKU’s and their related group id’s)
Products and associated groups are grabbed from webcat table and stored in by product ID
Example row for webcattable
PRODUCTID
31
31
31

PRODNO
0096
0096
0096

GROUPID
153
229
968

PRIORITY
32
381
239

ACTIVE
0
0
0

Example row for search_temp_2 table (groups all rows from webcat by product ID to populate search_temp_2)
productId
31

GroupId’s
GRP153,GRP229,GRP968,GRP997,GRP1017,GRP1049,GRP1511,GRP1595,GR
P1596,GRP1879,GRP1880,GRP1965,GRP1966,GRP1978,GRP1982,GRP1995,G
RP1999,GRP2028

menu and template_producttables(stores all CMS pages) used to populate search_temp_3
Example row from the menutable (Note: not all fieldsaren’t shown due to large quantity of rows in
table)
Page ID,
0
2
517
987

Page Name
Web Root
Lingerie
Eyemasks
Pyjamas

ParentID
0
0
989
393

TemplateID
0
3115
2006
2006

Published
1
1
1
1

Example row from the template_producttable (Note: not all fields aren’t shown due to large
quantity of rows in table)
TemplateID
2
517
987

Product Group ID
:
36:
98:

Version
5
1
1

Example row from the search_temp_3 (Products and page relations)
ProductID
18545

18545

Page Name
Spring Summer 2013
Gallery Petronella&
Hooded Lacy
Valentine's Day

Page ID
MENU3128
MENU3162
MENU1553

Title Image ID
3802
3824
Romantic Ranges
MENU3219
18545
Lingerie New In
MENU2 MENU1552
18545
Lingerie Spring
MENU2 MENU3129
Summer 2013
Using the data gather on the search_temp_3 table , a new table is created search_temp_4
grouping each row by ID and concatenating page names and page ID (MENUID) .
Example row from search_temp_4
Product ID

PetronellaWasp
ie

18545

Page Names (Key word for
Search)

Pages ID’s

Spring Summer 2013 Gallery
Petronella& Hooded
Lacy,Valentine's Day
Romantic Ranges,Lingerie
New In,Lingerie Spring
Summer 2013,Lingerie
Spring Summer
2013,Lingerie
Waspies,LingerieBridal,Linge
rie View All Lingerie,Lingerie
View All Lingerie,Lingerie
Bridal BridalLingerie,Lingerie
View All
Bridal,LingeriePetronella&
Hooded
Lacy,White,Corsetry,US TOP
NAV Spring Summer 2013
Gallery Petronella& Hooded
Lacy,US MAIN NAV
Valentine's Day Gifts
Romantic Valentine's
Gifts,US MAIN NAV Lingerie
Waspies,US MAIN NAV
Lingerie Bridal,US MAIN NAV
Lingerie Bridal
BridalLingerie,US MAIN NAV
Bridal BridalLingerie,US
MAIN NAV Bridal View All
Bridal,US MAIN NAV
White,US MAIN NAV
Corsetry

MENU3128 MENU3162,MENU1553
MENU3219,MENU2 MENU1552,MENU2
MENU3129,MENU2 MENU3129,MENU2
MENU1660,MENU2 MENU1512,MENU2
MENU1537,MENU2 MENU1537,MENU2
MENU2921 MENU2922,MENU2 MENU2921
MENU2935,MENU2
MENU3269,MENU2877,MENU1981,MENU2
019 MENU3171 MENU3187,MENU2020
MENU2816 MENU3224,MENU2020
MENU2086 MENU2094,MENU2020
MENU2086 MENU2097,MENU2020
MENU2086 MENU2097
MENU3262,MENU2020 MENU2106
MENU2107,MENU2020 MENU2106
MENU2110,MENU2020
MENU2880,MENU2020 MENU2788

The page names are used as key word for searching purposes, for example when searching the term
“Romantic” the product “PetronellaWaspie” (18545) should appear in the search results.
3.

Re-index begins storing all relevant product information to the sphinx data source by
querying the products table, joining the tables, ranges , styles colours , sizes, rangesty,
search_temp_4 (FOR PAGE NAME RELATION)and the search_temp_2 (FOR GROUP ID
RELATION)tables and also using the fields below.
ranges.RANGEDES, ranges.EXTRAKEYS,
styles.STYLESDES, styles.GENERICDE, styles.EXTRAKEYS,
colours.COLDES ,colours.EXTRAKEYS,
sizes.SIZEDES, sizes.SIZESEARCH, sizes.EXTRAKEYS
rangesty.GRPDESCRIP ,rangesty.PRODINFO, rangesty.EXTRAKEYS

All above field are used as key word to be indexed into the sphinx search engine, this is all for the
preparation of the re-index.
Arbitrary
After preparations the actual re-index is initiated after completion step four comes into player.

4. Process_search.php:runsdummy searches using sphinx search daemon then stores results in
table search_refine_results,without these tables population, product listings will not
display. After the results are then cached into a table called cache_main. Technically the
result are cached twice, first into search_refine_resultsand then into cache_main table.
In evaluation, I’ve discovered that step 4 database activities can causes a large strain on AP’s server
when web traffic is high. All this activity seems to only be for the sole purpose of gathering key word
for the Sphinx search engine, which may be necessary but extremely costly.

Main points of concern
1. Sphinx isn’t being utilised as it should be. Sphinx is supposed to improve the search
efficiency, however search result are gathered from MySQL approximately 95% of the
time. Sphinx is only utilised when a term is search for the first time.
2. General productlisting relies on Sphinx populating MySQL tables which is why when a
re-index process fails all products disappear from website. I don’t believe Sphinx should
have anything to do with general product listing but should only produce product
searching results.
3. A lot of database queries before the re-index, which I believe is the cause of the server
crashing during high volumes of traffic.
I have a few suggestions, instead of gathering data from varies different table for the purpose of
collecting search term for each product, couldn’t we just add an extra field on the product CMS call
product tags.And then dynamically append extra search tags like sizes, ranges, styles and colours.
E.G.
Extra Search Tags

Bra, Lingerie, White

With regards to the product listing pages, we could just make a direct call to database without the
dependency of the tables of data populated by sphinx (theory to be proven) so then if there is ever
a problem with a re-index in future, we remove the risk of the products completely vanishing on
website.
We will need to assess this processto come up with a more efficient and future productive solution.

Regards
Peter

More Related Content

PPTX
Ragic Database design 101
DOCX
Data migration
DOCX
Neeraj Pradeep (resume)
PDF
Meddelelser 22 1984
DOCX
MWrightResume
PPTX
Photo album
DOCX
Certificate_of(3)
PDF
Valgma vesiloo aprillikonverentsi-teesid-2011
Ragic Database design 101
Data migration
Neeraj Pradeep (resume)
Meddelelser 22 1984
MWrightResume
Photo album
Certificate_of(3)
Valgma vesiloo aprillikonverentsi-teesid-2011

Viewers also liked (10)

PDF
Meddelelser 23 1984
PDF
Заявление в генеральную прокуратуру РФ
PPTX
PocketWhale ASO Tips PocketGamer Connects London 2016
PDF
Outsourced Admin Flyer
DOCX
Unidaddidactica tic
PPT
Nikolai angelov-gary-2015-1
DOC
arabice magine
PPTX
Jangan lupa lapor SPT PPh Tahunan , udah tahun baru nihh
PDF
cara menilai PK Guru
PDF
Women’s Collection SS 2016
Meddelelser 23 1984
Заявление в генеральную прокуратуру РФ
PocketWhale ASO Tips PocketGamer Connects London 2016
Outsourced Admin Flyer
Unidaddidactica tic
Nikolai angelov-gary-2015-1
arabice magine
Jangan lupa lapor SPT PPh Tahunan , udah tahun baru nihh
cara menilai PK Guru
Women’s Collection SS 2016
Ad

Similar to Reindex process (detailed) (20)

DOCX
Exp19_Access_Ch02_Capstone - International Foodies 1.0Projec.docx
PDF
Sap enhanced functions
PDF
Inventory management system
PDF
inventory management system
PDF
Java Application Portfolio(Production control system)
PDF
skintutorial
PDF
skintutorial
PPT
Indexing Strategies
PPTX
ExtBase workshop
DOCX
CS 2336 PROJECT 3 – Linked Inventory Management Project Due 1104 b.docx
DOC
Cis407 a ilab 4 web application development devry university
DOC
Nota database object query
DOC
Notacd12
PDF
Notacd12
DOCX
Oracle EBS R12 Audit trial
PPT
SharePoint Project Phase 3 to 4
DOCX
New Products Web Site
DOCX
1 Week 6 - What Well Be Working On This Week In th.docx
PDF
Reaxys_Userguide.pdf
PPTX
Tips On Trick Odoo Add-On.pptx
Exp19_Access_Ch02_Capstone - International Foodies 1.0Projec.docx
Sap enhanced functions
Inventory management system
inventory management system
Java Application Portfolio(Production control system)
skintutorial
skintutorial
Indexing Strategies
ExtBase workshop
CS 2336 PROJECT 3 – Linked Inventory Management Project Due 1104 b.docx
Cis407 a ilab 4 web application development devry university
Nota database object query
Notacd12
Notacd12
Oracle EBS R12 Audit trial
SharePoint Project Phase 3 to 4
New Products Web Site
1 Week 6 - What Well Be Working On This Week In th.docx
Reaxys_Userguide.pdf
Tips On Trick Odoo Add-On.pptx
Ad

Recently uploaded (20)

PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Architecture types and enterprise applications.pdf
PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
STKI Israel Market Study 2025 version august
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
2018-HIPAA-Renewal-Training for executives
PPTX
Chapter 5: Probability Theory and Statistics
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Architecture types and enterprise applications.pdf
Microsoft Excel 365/2024 Beginner's training
Benefits of Physical activity for teenagers.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
STKI Israel Market Study 2025 version august
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A proposed approach for plagiarism detection in Myanmar Unicode text
The influence of sentiment analysis in enhancing early warning system model f...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
TEXTILE technology diploma scope and career opportunities
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
CloudStack 4.21: First Look Webinar slides
Flame analysis and combustion estimation using large language and vision assi...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
OpenACC and Open Hackathons Monthly Highlights July 2025
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
2018-HIPAA-Renewal-Training for executives
Chapter 5: Probability Theory and Statistics

Reindex process (detailed)

  • 1. Reindex Process Once an re-index has been triggered a file is created called u1/agentshop/webstore/process_search.semWhich will allow the cron job to run and process a shell script calledu1/agentshop/scripts/search/process_search_live.sh , The cron job is set to run every minute. Once the process has started the actual re-index doesn’t commence until a few database query have been processed and the file u1/agentshop/webstore/process_search.semis deleted and replaced with u1/agentshop/webstore/process_search.in which is a flag to allow any current searches in progress to identifies that a re-index is in process. Process stages 1. The first file that is process by the shell script is a file calledprocess_search_sort_index.phpwhich truncates the table searh_sort_indexthen create a temporary table named searh_sort_index_tempthis table is populated by querying all product in the webcat table with the priority order for product’s. Its purpose is for displaying products in a specified order AP has set under theproduct groups tab on the CMS. The search_sort_indextable is only updated when a re-index has been initiated. Example Rowfrom thesearch_sort_indextable : sortid P517 productid 10032 prodno 21791 priority 001-01 miscidtype P miscid 517 grouptail
  • 2. 2. process_search_pre.php :prepares for the re-index which grabs product data from three tables. webcattable (stores SKU’s and their related group id’s) Products and associated groups are grabbed from webcat table and stored in by product ID Example row for webcattable PRODUCTID 31 31 31 PRODNO 0096 0096 0096 GROUPID 153 229 968 PRIORITY 32 381 239 ACTIVE 0 0 0 Example row for search_temp_2 table (groups all rows from webcat by product ID to populate search_temp_2) productId 31 GroupId’s GRP153,GRP229,GRP968,GRP997,GRP1017,GRP1049,GRP1511,GRP1595,GR P1596,GRP1879,GRP1880,GRP1965,GRP1966,GRP1978,GRP1982,GRP1995,G RP1999,GRP2028 menu and template_producttables(stores all CMS pages) used to populate search_temp_3 Example row from the menutable (Note: not all fieldsaren’t shown due to large quantity of rows in table) Page ID, 0 2 517 987 Page Name Web Root Lingerie Eyemasks Pyjamas ParentID 0 0 989 393 TemplateID 0 3115 2006 2006 Published 1 1 1 1 Example row from the template_producttable (Note: not all fields aren’t shown due to large quantity of rows in table) TemplateID 2 517 987 Product Group ID : 36: 98: Version 5 1 1 Example row from the search_temp_3 (Products and page relations) ProductID 18545 18545 Page Name Spring Summer 2013 Gallery Petronella& Hooded Lacy Valentine's Day Page ID MENU3128 MENU3162 MENU1553 Title Image ID 3802 3824
  • 3. Romantic Ranges MENU3219 18545 Lingerie New In MENU2 MENU1552 18545 Lingerie Spring MENU2 MENU3129 Summer 2013 Using the data gather on the search_temp_3 table , a new table is created search_temp_4 grouping each row by ID and concatenating page names and page ID (MENUID) . Example row from search_temp_4 Product ID PetronellaWasp ie 18545 Page Names (Key word for Search) Pages ID’s Spring Summer 2013 Gallery Petronella& Hooded Lacy,Valentine's Day Romantic Ranges,Lingerie New In,Lingerie Spring Summer 2013,Lingerie Spring Summer 2013,Lingerie Waspies,LingerieBridal,Linge rie View All Lingerie,Lingerie View All Lingerie,Lingerie Bridal BridalLingerie,Lingerie View All Bridal,LingeriePetronella& Hooded Lacy,White,Corsetry,US TOP NAV Spring Summer 2013 Gallery Petronella& Hooded Lacy,US MAIN NAV Valentine's Day Gifts Romantic Valentine's Gifts,US MAIN NAV Lingerie Waspies,US MAIN NAV Lingerie Bridal,US MAIN NAV Lingerie Bridal BridalLingerie,US MAIN NAV Bridal BridalLingerie,US MAIN NAV Bridal View All Bridal,US MAIN NAV White,US MAIN NAV Corsetry MENU3128 MENU3162,MENU1553 MENU3219,MENU2 MENU1552,MENU2 MENU3129,MENU2 MENU3129,MENU2 MENU1660,MENU2 MENU1512,MENU2 MENU1537,MENU2 MENU1537,MENU2 MENU2921 MENU2922,MENU2 MENU2921 MENU2935,MENU2 MENU3269,MENU2877,MENU1981,MENU2 019 MENU3171 MENU3187,MENU2020 MENU2816 MENU3224,MENU2020 MENU2086 MENU2094,MENU2020 MENU2086 MENU2097,MENU2020 MENU2086 MENU2097 MENU3262,MENU2020 MENU2106 MENU2107,MENU2020 MENU2106 MENU2110,MENU2020 MENU2880,MENU2020 MENU2788 The page names are used as key word for searching purposes, for example when searching the term “Romantic” the product “PetronellaWaspie” (18545) should appear in the search results.
  • 4. 3. Re-index begins storing all relevant product information to the sphinx data source by querying the products table, joining the tables, ranges , styles colours , sizes, rangesty, search_temp_4 (FOR PAGE NAME RELATION)and the search_temp_2 (FOR GROUP ID RELATION)tables and also using the fields below. ranges.RANGEDES, ranges.EXTRAKEYS, styles.STYLESDES, styles.GENERICDE, styles.EXTRAKEYS, colours.COLDES ,colours.EXTRAKEYS, sizes.SIZEDES, sizes.SIZESEARCH, sizes.EXTRAKEYS rangesty.GRPDESCRIP ,rangesty.PRODINFO, rangesty.EXTRAKEYS All above field are used as key word to be indexed into the sphinx search engine, this is all for the preparation of the re-index. Arbitrary After preparations the actual re-index is initiated after completion step four comes into player. 4. Process_search.php:runsdummy searches using sphinx search daemon then stores results in table search_refine_results,without these tables population, product listings will not display. After the results are then cached into a table called cache_main. Technically the result are cached twice, first into search_refine_resultsand then into cache_main table. In evaluation, I’ve discovered that step 4 database activities can causes a large strain on AP’s server when web traffic is high. All this activity seems to only be for the sole purpose of gathering key word for the Sphinx search engine, which may be necessary but extremely costly. Main points of concern 1. Sphinx isn’t being utilised as it should be. Sphinx is supposed to improve the search efficiency, however search result are gathered from MySQL approximately 95% of the time. Sphinx is only utilised when a term is search for the first time. 2. General productlisting relies on Sphinx populating MySQL tables which is why when a re-index process fails all products disappear from website. I don’t believe Sphinx should have anything to do with general product listing but should only produce product searching results. 3. A lot of database queries before the re-index, which I believe is the cause of the server crashing during high volumes of traffic. I have a few suggestions, instead of gathering data from varies different table for the purpose of collecting search term for each product, couldn’t we just add an extra field on the product CMS call product tags.And then dynamically append extra search tags like sizes, ranges, styles and colours. E.G. Extra Search Tags Bra, Lingerie, White With regards to the product listing pages, we could just make a direct call to database without the dependency of the tables of data populated by sphinx (theory to be proven) so then if there is ever a problem with a re-index in future, we remove the risk of the products completely vanishing on website.
  • 5. We will need to assess this processto come up with a more efficient and future productive solution. Regards Peter