<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="ro">
	<id>https://wiki.dexonline.ro/index.php?action=history&amp;feed=atom&amp;title=Design_doc_Crawler</id>
	<title>Design doc Crawler - Revizia istoricului</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.dexonline.ro/index.php?action=history&amp;feed=atom&amp;title=Design_doc_Crawler"/>
	<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;action=history"/>
	<updated>2026-05-22T06:54:08Z</updated>
	<subtitle>Istoricul versiunilor pentru această pagină din wiki</subtitle>
	<generator>MediaWiki 1.37.1</generator>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=3200&amp;oldid=prev</id>
		<title>Cătălin.Frâncu la 4 februarie 2019 13:15</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=3200&amp;oldid=prev"/>
		<updated>2019-02-04T13:15:15Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;ro&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Versiunea anterioară&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Versiunea de la data 4 februarie 2019 16:15&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l194&quot;&gt;Linia 194:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 194:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== Setările de bază ===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== Setările de bază ===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;1) în /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwwbase&lt;/del&gt;/Crawler trebuie: .htaccess, AbstractCrawler.php, AppLog.php, Crawler.php, MemoryManagement.php, simple_html_dom.php, WhiteList.txt, directoarele ParsedText și RawPage&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;1) în /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/Crawler trebuie: .htaccess, AbstractCrawler.php, AppLog.php, Crawler.php, MemoryManagement.php, simple_html_dom.php, WhiteList.txt, directoarele ParsedText și RawPage&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2) în /DEX/: la dex.conf trebuie adăugat fișierul /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwbase&lt;/del&gt;/Crawler/crawler_dex.conf (commentarii în fișier)&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2) în /DEX/: la dex.conf trebuie adăugat fișierul /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/Crawler/crawler_dex.conf (commentarii în fișier)&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3) /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;phplib&lt;/del&gt;/models/CrawledPage.php și Link.php trebuie puse în același director pe server&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3) /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;lib&lt;/ins&gt;/models/CrawledPage.php și Link.php trebuie puse în același director pe server&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;4) Trebuie să puneți un fișier conținând user_agent-ul crawler-ului undeva într-un director care nu este public (am observat că unele site-uri nu cer user_agent (ex: romlit)).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;4) Trebuie să puneți un fișier conținând user_agent-ul crawler-ului undeva într-un director care nu este public (am observat că unele site-uri nu cer user_agent (ex: romlit)).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l206&quot;&gt;Linia 206:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 206:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Și apoi să-i dați calea absolută în dex.conf :: user_agent_location&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Și apoi să-i dați calea absolută în dex.conf :: user_agent_location&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;5) Editați fișierul /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwwbase&lt;/del&gt;/Crawler/WhiteList.txt, instrucțiuni de utilizare în fișier&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;5) Editați fișierul /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/Crawler/WhiteList.txt, instrucțiuni de utilizare în fișier&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== Rulare crawler ===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== Rulare crawler ===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;în directorul /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwwbase&lt;/del&gt;/Crawler/ rulați comanda&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;în directorul /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/Crawler/ rulați comanda&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;php Crawler.php&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;php Crawler.php&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l216&quot;&gt;Linia 216:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 216:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;va porni crawlerul, acesta va lua primul domeniu din WhiteList.txt și va căuta linkuri nedescoperite în baza de date.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;va porni crawlerul, acesta va lua primul domeniu din WhiteList.txt și va căuta linkuri nedescoperite în baza de date.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Crawler-ul dispune de un fișier de log: /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwwbase&lt;/del&gt;/Crawler/crawler_log, iar ]n dex.conf puteți seta&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Crawler-ul dispune de un fișier de log: /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/Crawler/crawler_log, iar ]n dex.conf puteți seta&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;log2screen și log2file.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;log2screen și log2file.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l223&quot;&gt;Linia 223:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 223:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;URL-ul relativ va fi /Crawler/index.php, nu /Crawler/&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;URL-ul relativ va fi /Crawler/index.php, nu /Crawler/&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;1) Necesitatea fișierului /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwwbase&lt;/del&gt;/Crawler/index.php&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;1) Necesitatea fișierului /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/Crawler/index.php&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2) În directorul /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;phplib&lt;/del&gt;/ trebuie adăugat fișierul &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;SmartyWrap&lt;/del&gt;.php ce conține metoda&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2) În directorul /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;lib&lt;/ins&gt;/ trebuie adăugat fișierul &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Smart&lt;/ins&gt;.php ce conține metoda&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;syntaxhighlight lang=&amp;quot;php&amp;quot;&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;syntaxhighlight lang=&amp;quot;php&amp;quot;&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l234&quot;&gt;Linia 234:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 234:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3) În directorul /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwbase&lt;/del&gt;/ajax trebuie adăugat fișierul fetchCrawlerStatus.php din același director de pe SVN&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3) În directorul /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/ajax trebuie adăugat fișierul fetchCrawlerStatus.php din același director de pe SVN&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;4) În directorul /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwbase&lt;/del&gt;/styles trebuie adăugat fișierul crawler.css din același director de pe SVN&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;4) În directorul /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/styles trebuie adăugat fișierul crawler.css din același director de pe SVN&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;5) În directorul /DEX/templates/crawler trebuie adăugat fișierul crawler.ihtml din același director de pe SVN. Dacă am uitat ceva și programul nu merge, încercați să verificați scripturile încluse din crawler.ihtml.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;5) În directorul /DEX/templates/crawler trebuie adăugat fișierul crawler.ihtml din același director de pe SVN. Dacă am uitat ceva și programul nu merge, încercați să verificați scripturile încluse din crawler.ihtml.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;6) În directorul /DEX/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;wwwbase&lt;/del&gt;/Crawler/ trebuie adăugat fișierul .htaccess din același director de pe SVN.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;6) În directorul /DEX/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;www&lt;/ins&gt;/Crawler/ trebuie adăugat fișierul .htaccess din același director de pe SVN.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;(Autor: Alin Ungureanu)&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;(Autor: Alin Ungureanu)&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Categorie:Design docs]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Categorie:Design docs]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff::1.12:old-814:rev-3200 --&gt;
&lt;/table&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=814&amp;oldid=prev</id>
		<title>Cătălin.Frâncu: /* Sumar */</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=814&amp;oldid=prev"/>
		<updated>2014-02-04T11:44:53Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Sumar&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;ro&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Versiunea anterioară&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Versiunea de la data 4 februarie 2014 14:44&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l7&quot;&gt;Linia 7:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 7:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* O statistică a folosirii diacriticelor&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* O statistică a folosirii diacriticelor&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Vezi și [[Crawler TODO]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Proiectarea in ansamblu ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Proiectarea in ansamblu ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff::1.12:old-813:rev-814 --&gt;
&lt;/table&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=813&amp;oldid=prev</id>
		<title>Cătălin.Frâncu la 4 februarie 2014 11:44</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=813&amp;oldid=prev"/>
		<updated>2014-02-04T11:44:32Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;ro&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Versiunea anterioară&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Versiunea de la data 4 februarie 2014 14:44&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l240&quot;&gt;Linia 240:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 240:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;6) În directorul /DEX/wwwbase/Crawler/ trebuie adăugat fișierul .htaccess din același director de pe SVN.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;6) În directorul /DEX/wwwbase/Crawler/ trebuie adăugat fișierul .htaccess din același director de pe SVN.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(Autor: Alin Ungureanu)&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Categorie:Design docs]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Categorie:Design docs]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff::1.12:old-812:rev-813 --&gt;
&lt;/table&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=812&amp;oldid=prev</id>
		<title>Cătălin.Frâncu la 4 februarie 2014 11:43</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=812&amp;oldid=prev"/>
		<updated>2014-02-04T11:43:44Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;ro&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Versiunea anterioară&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Versiunea de la data 4 februarie 2014 14:43&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l240&quot;&gt;Linia 240:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 240:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;6) În directorul /DEX/wwwbase/Crawler/ trebuie adăugat fișierul .htaccess din același director de pe SVN.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;6) În directorul /DEX/wwwbase/Crawler/ trebuie adăugat fișierul .htaccess din același director de pe SVN.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Categorie:Design docs]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Categorie:Design docs]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff::1.12:old-811:rev-812 --&gt;
&lt;/table&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=811&amp;oldid=prev</id>
		<title>Cătălin.Frâncu: /* Setări pentru monitorul din browser */</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=811&amp;oldid=prev"/>
		<updated>2014-02-04T11:42:50Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Setări pentru monitorul din browser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;ro&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Versiunea anterioară&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Versiunea de la data 4 februarie 2014 14:42&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l225&quot;&gt;Linia 225:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 225:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2) În directorul /DEX/phplib/ trebuie adăugat fișierul SmartyWrap.php ce conține metoda&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2) În directorul /DEX/phplib/ trebuie adăugat fișierul SmartyWrap.php ce conține metoda&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;syntaxhighlight lang=&quot;php&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;   static function smartyDisplay($skin) {&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;   static function smartyDisplay($skin) {&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;      &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;      &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;     self::$theSmarty-&amp;gt;display($skin);&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;     self::$theSmarty-&amp;gt;display($skin);&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;   }&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;   }&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;/syntaxhighlight&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3) În directorul /DEX/wwwbase/ajax trebuie adăugat fișierul fetchCrawlerStatus.php din același director de pe SVN&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3) În directorul /DEX/wwwbase/ajax trebuie adăugat fișierul fetchCrawlerStatus.php din același director de pe SVN&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff::1.12:old-810:rev-811 --&gt;
&lt;/table&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=810&amp;oldid=prev</id>
		<title>Cătălin.Frâncu: /* Proiectarea in detaliu */</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=810&amp;oldid=prev"/>
		<updated>2014-02-04T11:42:29Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Proiectarea in detaliu&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;a href=&quot;https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;amp;diff=810&amp;amp;oldid=809&quot;&gt;Afișare diferențe&lt;/a&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=809&amp;oldid=prev</id>
		<title>Cătălin.Frâncu: /* Proiectarea in ansamblu */</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=809&amp;oldid=prev"/>
		<updated>2014-02-04T11:40:34Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Proiectarea in ansamblu&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;ro&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Versiunea anterioară&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Versiunea de la data 4 februarie 2014 14:40&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l48&quot;&gt;Linia 48:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Linia 48:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Astfel cand facem back-off si revenim putem avea unde s-a oprit.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Astfel cand facem back-off si revenim putem avea unde s-a oprit.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Proiectarea in detaliu ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Proiectarea in detaliu ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff::1.12:old-808:rev-809 --&gt;
&lt;/table&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
	<entry>
		<id>https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=808&amp;oldid=prev</id>
		<title>Cătălin.Frâncu: Pagină nouă: == Sumar ==  Construirea unui crawler și a unui index pentru site-uri românești (cu texte corecte din p.d.v. gramatical):  * Găsirea de cuvinte pe care DEX nu le conține pentr...</title>
		<link rel="alternate" type="text/html" href="https://wiki.dexonline.ro/index.php?title=Design_doc_Crawler&amp;diff=808&amp;oldid=prev"/>
		<updated>2014-02-04T11:40:04Z</updated>

		<summary type="html">&lt;p&gt;Pagină nouă: == Sumar ==  Construirea unui crawler și a unui index pentru site-uri românești (cu texte corecte din p.d.v. gramatical):  * Găsirea de cuvinte pe care DEX nu le conține pentr...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Pagină nouă&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Sumar ==&lt;br /&gt;
&lt;br /&gt;
Construirea unui crawler și a unui index pentru site-uri românești (cu texte corecte din p.d.v. gramatical):&lt;br /&gt;
&lt;br /&gt;
* Găsirea de cuvinte pe care DEX nu le conține pentru a le adauga (cu definiții)&lt;br /&gt;
* Când un utilizator caută definiții vrei să îi oferi și exemple în care s-ar folosi cuvântul respectiv.&lt;br /&gt;
* O statistică a folosirii diacriticelor&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Proiectarea in ansamblu ==&lt;br /&gt;
&lt;br /&gt;
Crawlerul indentifică si salveaza paginile raw, extrage text-ul relevant si il salveaza ca fisier, depune in DB o linie in tabelul CrawledPage(id, timestamp, url, httpStatus, rawPagePath, parsedPageText), apoi din parsarea paginii respective vor rezulta link-uri, acestea vor fi adaugate in tabelul Link(descriere la Crawler mai jos). Dupa un timp de asteptare, crawler-ul poate incepe accesarea unei noi pagini.&lt;br /&gt;
Astfel se cere o singura data pagina de la site, crawler-ul este separat de indexer, putand fi apelate independent.&lt;br /&gt;
&lt;br /&gt;
Indexer-ul se uita in baza de date in CrawledPage si selecteaza urmatoarea linie cu textul neindexat(verifica eventual in tabela index daca exista), apoi dupa indexare o marcheaza.&lt;br /&gt;
&lt;br /&gt;
Codul acesta zice ca se face exec in background&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;php&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
function execInBackground($cmd) {&lt;br /&gt;
&lt;br /&gt;
    exec($cmd . &amp;quot; &amp;gt; /dev/null &amp;amp;&amp;quot;);&lt;br /&gt;
}&lt;br /&gt;
?&amp;gt;&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
L-am testat, putem deci rula mai multe instante de indexer, in cazul in care indexarea ar merge mai greu decat parsarea + T wait.&lt;br /&gt;
&lt;br /&gt;
Trebuie avută in vedere o verificare a fisierelor text, dacă au fost indexate. Pentru asta as propune un log + program php de monitorizare, in eventualitatea unui crash al sistemului.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Folosim log-uri peste tot si un panou (eventual meniu care alege fisierul log) care arata o statistica a activitatii.&lt;br /&gt;
&lt;br /&gt;
Ex&lt;br /&gt;
1)  timestamp | URL&lt;br /&gt;
saved raw page&lt;br /&gt;
parsed text&lt;br /&gt;
2)&lt;br /&gt;
.&lt;br /&gt;
.&lt;br /&gt;
.&lt;br /&gt;
42) timestamp | URL&lt;br /&gt;
saved raw page&lt;br /&gt;
parsed text&lt;br /&gt;
&lt;br /&gt;
Adaugam linii pe masura ce indeplinim activitati, astfel la un crash stim unde s-a oprit, in loc de saved raw page sau parsed text putem pune 0/1, T/F, etc&lt;br /&gt;
&lt;br /&gt;
Astfel cand facem back-off si revenim putem avea unde s-a oprit.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Proiectarea in detaliu ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Crawler ===&lt;br /&gt;
&lt;br /&gt;
- un mecanism prin care să mint site-ul că programul meu e browserul (se rezolvă cu php curl),&lt;br /&gt;
între accesul paginilor din site putem pune un T wait (timp de așteptare) 1 min.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Baza de date&lt;br /&gt;
&lt;br /&gt;
    CrawledPage(id, timestamp, url, httpStatus, rawPagePath, parsedTextPath)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
    Link(id, url, canonicalUrl, domain, urlHash, crawledPageId)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
    httpStatus - HTTP status code&lt;br /&gt;
&lt;br /&gt;
    Verificam daca linkul a fost expandat prin a vedea dacă a intrat în tabelul CrawledPage&lt;br /&gt;
&lt;br /&gt;
    crawledPageId - același cu id de la CrawledPage, pagina în care am gasit link-ul&lt;br /&gt;
&lt;br /&gt;
    urlHash - comparăm ca să nu inserăm același link din nou(md5(canonicalUrl)).&lt;br /&gt;
&lt;br /&gt;
Totuși o să avem o problemă la linkuri relative vs linkuri absolute. Putem să eliminăm partea de Domain atunci cănd procesăm linkul, dar în final trebuie să compunem linkul absolut. Trebuie ignorate linkurile care n-au extensii relevante (imagini, etc).&lt;br /&gt;
    Nu am găsit încă o bibliotecă pentru asta asemănător cu realpath. Dacă e, facem noi un mecanism, compunem link-ul și apoi îl testăm încercând să accesăm pagina respectivă și vedem ce întoarce.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Parsare HTML ===&lt;br /&gt;
&lt;br /&gt;
- o bibliotecă pentru parsat pagini html, am folosit simple_html_dom (e mică dar își face treaba)&lt;br /&gt;
- as merge si pe o biblioteca care face strip html tags, acum depinde ce gasim si cum se comporta cu broken html&lt;br /&gt;
- dacă nu găsim nimic, putem trânti un regexp chior -- găsim &amp;lt;div&amp;gt;-ul în care stă conținutul care ne interesează și facem un regexp doar pentru acela. Asta va insemna ca o sa folosim regexp-uri diferite per site&lt;br /&gt;
&lt;br /&gt;
=== Indexer ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Indexer - php&lt;br /&gt;
baza de date - paris&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Tabele&lt;br /&gt;
	&lt;br /&gt;
Cuvant&lt;br /&gt;
&lt;br /&gt;
	id 	cuvantCuDiacritice	cuvantFaraDiacritice  idLexem=null (null daca nu are definitie)		&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Freq&lt;br /&gt;
&lt;br /&gt;
	id caleFisier idCuvant	freq(de aparitie sau de cate ori apare?)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
UseCase&lt;br /&gt;
&lt;br /&gt;
	id 	idCuvant  text (propozitii)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Mecanism diacritice&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Am să fac un tabel cu toate zonele care contin diacritice pe 8 sa 10 litere lungime și probabilitatea să se intample așa, plus un tabel în care o să țin aceste zone fără diacritice&lt;br /&gt;
&lt;br /&gt;
zonaFaraDiactice&lt;br /&gt;
&lt;br /&gt;
id  text(string(10))&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
ZonaNeformatata&lt;br /&gt;
&lt;br /&gt;
id  idZonaFaraDiacritice  count(default 1) text(string(10))&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Facem count idZonaFaraDiacritice, apoi toate zonele cu același id (idZonaFaraDiacritice), calculam freq in tabelul FreqZone&lt;br /&gt;
&lt;br /&gt;
FreqZone&lt;br /&gt;
&lt;br /&gt;
idZonaNeformatata  freq(float)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Putem face un panou ca la diacritice.com (idee Catalin), dropdownselect cu cea mai probabilă variantă (urmate de restul).&lt;br /&gt;
&lt;br /&gt;
S-ar putea să nu aibă logică pe undeva că am scris-o în grabă. O mai revizuiesc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Old Post&lt;br /&gt;
&lt;br /&gt;
    - să găsesc cuvinte pe care DEX nu le știe, e destul de challenging cum aș face asta rapid, dacă cuvintele sunt ordonate alfabetic, atunci e mai simplu puțin, depinde și dacă ai copii de tabele ordonate invers sau nu (așa am putea folosi LIMIT 1 pentru a opri query-ul).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
    Cred că asta se poate face în timpul indexării. Pentru căutarea full text pe dexonline, de exemplu, indexez un cuvânt găsit cu toate lexemele de care ar putea aparține. Astfel, „copii” poate proveni din „copie” sau din „copil”. Cele care nu aparțin de niciun lexem sunt cele pe care le dorim (ai să râzi, dar și baza de definiții a dexonline conține câteva cuvinte necunoscute).&lt;br /&gt;
&lt;br /&gt;
    Mai trebuie tratate câteva cazuri speciale:&lt;br /&gt;
&lt;br /&gt;
    - Forme contrase (am uitat exact cum se numesc), de exemplu „gândindu-” din „gândindu-se”.&lt;br /&gt;
    - Forme compuse, care adesea nu apar ca atare în dicționar: nemai+verb (doar la participiu sau la gerunziu -- nemaipomenit, nemaicrezând), prefixe ca re-, ne-, anti- etc.&lt;br /&gt;
    - Nume proprii&lt;br /&gt;
&lt;br /&gt;
- daca avem o biblioteca toleranta la broken html, am putea face un parser general cu mai multe blocuri try-catch, (eventual logam tot ce se intampla aici), astfel evitam scrierea unei expresii regulate pentru fiecare site in parte.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
- la partea de store-tuples n-aș pune definiția în DB, mai degrabă un url local de fișier.&lt;br /&gt;
&lt;br /&gt;
index, timestamp, URL, raw_page_path, parsed_text_path&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Inserarea de diacritice ===&lt;br /&gt;
&lt;br /&gt;
- o tabelă de cuvinte care contin diacritice (cu toate formele), eventual un associative array în memorie folosind memcached sau altceva, cu cheia *cuvantul fara diacritice* si valoarea *cuvantul cu diacritice*&lt;br /&gt;
- pentru fiecare cuvânt parsat, verificăm dacă se poate scrie cu diacritice și îl înlocuim cu varianta respectivă.&lt;br /&gt;
&lt;br /&gt;
=== prelucrare tarball ===&lt;br /&gt;
&lt;br /&gt;
- daca link-urile sunt relative, ar merge crawler-ul si local, dar eu zic ca n-are rost sa functioneze la fel ca pentru parsare online, deoarece ar insemna salvarea din nou a paginii raw. In plus e mai rapid sa parcurgem toate fisierele si directoarele dintr-un director radacina. De aceea propun să avem un crawler local care doar extrage text si inserează în DB în tabelul CrawledPage.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== checkpoint &amp;amp; resume ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ca sa nu parcurgem de mai multe ori acelasi link, ideea e sa facem hash pe linkuri si sa comparam hash-urile.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
+ adaugare la panoul de monitorizare&lt;br /&gt;
+ functiuni la panoul de monitorizare cu posibilitatea de a reporni crawler-ul sau indexer-ul pe o anumita pagina/fisier text&lt;br /&gt;
&lt;br /&gt;
===fisier de configurare===&lt;br /&gt;
&lt;br /&gt;
2 variante&lt;br /&gt;
1) conf.ini&lt;br /&gt;
[db]&lt;br /&gt;
key1=value1&lt;br /&gt;
key2=value2&lt;br /&gt;
Global.php care incarca intr-o variabila array statica generată de parse_ini_file(config.ini)&lt;br /&gt;
&lt;br /&gt;
2) conf.php&lt;br /&gt;
&lt;br /&gt;
define(&amp;quot;_KEY1_&amp;quot;, &amp;quot;VAL1&amp;quot;);&lt;br /&gt;
define(&amp;quot;_KEY2_&amp;quot;, &amp;quot;VAL2&amp;quot;);&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Documentația Crawlerului ==&lt;br /&gt;
== [outdated] ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Nu uitați să instalați php-curl&lt;br /&gt;
&lt;br /&gt;
=== Baza de date ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Setările de bază ===&lt;br /&gt;
&lt;br /&gt;
1) în /DEX/wwwwbase/Crawler trebuie: .htaccess, AbstractCrawler.php, AppLog.php, Crawler.php, MemoryManagement.php, simple_html_dom.php, WhiteList.txt, directoarele ParsedText și RawPage&lt;br /&gt;
&lt;br /&gt;
2) în /DEX/: la dex.conf trebuie adăugat fișierul /DEX/wwwbase/Crawler/crawler_dex.conf (commentarii în fișier)&lt;br /&gt;
&lt;br /&gt;
3) /DEX/phplib/models/CrawledPage.php și Link.php trebuie puse în același director pe server&lt;br /&gt;
&lt;br /&gt;
4) Trebuie să puneți un fișier conținând user_agent-ul crawler-ului undeva într-un director care nu este public (am observat că unele site-uri nu cer user_agent (ex: romlit)).&lt;br /&gt;
&lt;br /&gt;
Mozilla/5.0 (compatible; MSIE 10.6; Windows NT 6.1; Trident/5.0; InfoPath.2; SLCC1; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 2.0.50727) 3gpp-gba UNTRUSTED/1.0&lt;br /&gt;
&lt;br /&gt;
Și apoi să-i dați calea absolută în dex.conf :: user_agent_location&lt;br /&gt;
&lt;br /&gt;
5) Editați fișierul /DEX/wwwwbase/Crawler/WhiteList.txt, instrucțiuni de utilizare în fișier&lt;br /&gt;
&lt;br /&gt;
=== Rulare crawler ===&lt;br /&gt;
&lt;br /&gt;
în directorul /DEX/wwwwbase/Crawler/ rulați comanda&lt;br /&gt;
&lt;br /&gt;
php Crawler.php&lt;br /&gt;
&lt;br /&gt;
va porni crawlerul, acesta va lua primul domeniu din WhiteList.txt și va căuta linkuri nedescoperite în baza de date.&lt;br /&gt;
&lt;br /&gt;
Crawler-ul dispune de un fișier de log: /DEX/wwwwbase/Crawler/crawler_log, iar ]n dex.conf puteți seta&lt;br /&gt;
log2screen și log2file.&lt;br /&gt;
&lt;br /&gt;
=== Setări pentru monitorul din browser ===&lt;br /&gt;
&lt;br /&gt;
URL-ul relativ va fi /Crawler/index.php, nu /Crawler/&lt;br /&gt;
&lt;br /&gt;
1) Necesitatea fișierului /DEX/wwwwbase/Crawler/index.php&lt;br /&gt;
&lt;br /&gt;
2) În directorul /DEX/phplib/ trebuie adăugat fișierul SmartyWrap.php ce conține metoda&lt;br /&gt;
  static function smartyDisplay($skin) {&lt;br /&gt;
    &lt;br /&gt;
    self::$theSmarty-&amp;gt;display($skin);&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
3) În directorul /DEX/wwwbase/ajax trebuie adăugat fișierul fetchCrawlerStatus.php din același director de pe SVN&lt;br /&gt;
&lt;br /&gt;
4) În directorul /DEX/wwwbase/styles trebuie adăugat fișierul crawler.css din același director de pe SVN&lt;br /&gt;
&lt;br /&gt;
5) În directorul /DEX/templates/crawler trebuie adăugat fișierul crawler.ihtml din același director de pe SVN. Dacă am uitat ceva și programul nu merge, încercați să verificați scripturile încluse din crawler.ihtml.&lt;br /&gt;
&lt;br /&gt;
6) În directorul /DEX/wwwbase/Crawler/ trebuie adăugat fișierul .htaccess din același director de pe SVN.&lt;br /&gt;
[[Categorie:Design docs]]&lt;/div&gt;</summary>
		<author><name>Cătălin.Frâncu</name></author>
	</entry>
</feed>