Adaptatywne serwery WWW
Marek Wojciechowski, Maciej Zakrzewicz
3ROLWHFKQLND3R]QDVND,QVW\WXW,QIRUPDW\NL
ul.
3LRWURZRD3R]QD
e-mail: {marek,mzakrz}@cs.put.poznan.pl
Abstrakt.
$GDSWDW\ZQH VHUZHU\ ::: Z\NRU]\VWXM DQDOL] SOLNyZ ORJX Z FHOX DXWRPDW\F]QHM
WUDQVIRUPDFML]DZDUWRFLLVWUXNWXU\XGRVW SQLDQ\FKGRNXPHQWyZ:UH]XOWDFLHVHUZHU:::VDPRG]LHOQLH
ÄGRSDVRZXMH´VL
GRRF]HNLZDX*\WNRZQLNDÄRGJDGXMF´MHJRLQWHQFMH:DUW\NXOHSU]HGVWDZLRQRGRVW
SQH
metody zautomatyzowanej analizy plików logu oraz stosowania znalezionych trendów i korelacji w
dynamicznej transformacji dokumentów WWW..
1. Wprowadzenie
3URMHNWRZDQLH VWUXNWXU\ ]DZDUWRFL VHUZHUD ::: MHVW Z RJyOQRFL SUREOHPHP ]áR*RQ\P L
WUXGQ\P 3URMHNWDQFL SRGHMPXM ]DGDQLH WDNLHJR RSUDFRZDQLD Z\JOGX L SRZL]D SRPL G]\
GRNXPHQWDPL:::DE\E\á\RQHF]\WHOQHLáDWZHZQDZLJDFMLGODX*\WNRZQLNyZ1DSUREOHP
SURMHNWRZDQLDVWUXNWXU\]DZDUWRFLVHUZHUD:::PDMZSá\ZQDVW SXMFHF]\QQLNL
1.
5y*QL X*\WNRZQLF\PRJNRU]\VWDü]VHUZHUD:::FHOX]QDOH]LHQLDinnych informacji.
3U]\NáDGRZR MH*HOL GR VNOHSX LQWHUQHWRZHJR SU]\áF]D VL
X*\WNRZQLN PáRG\ WR GREU]H
E\áRE\DE\VWURQDJáyZQD]DZLHUDáDLQIRUPDFMHRQDMQRZV]\FKJUDFKNRPSXWHURZ\FK.LHG\
MHGQDN SU]HVáDQLD VWURQ\ JáyZQHM ]D*GD X*\WNRZQLN VWDUV]\ ZWHG\ ZVND]DQH E\áRE\
XPLHV]F]HQLH QD QLHM LQIRUPDFML R QDMQRZV]\FK SR]\FMDFK NVL*NRZ\FK L Sá\WDFK ] PX]\N
NODV\F]Q
2. W
Uy*Q\FK momentach czasowychMHGHQX*\WNRZQLNPR*HSRV]XNLZDüinnych informacji.
3U]\NáDGRZR X*\WNRZQLN LQWHUQHWRZHJR ELXUD SRGUy*\ E G]LH Z RNUHVLH ]LPRZ\P
]DLQWHUHVRZDQ\ GRNXPHQWDPL ::: ]DZLHUDMF\PL LQIRUPDFMH R NXURUWDFK QDUFLDUVNLFK Z
$OSDFK QDWRPLDVW Z RNUHVLH OHWQLP WHQ VDP X*\WNRZQLN *\F]\áE\ VRELH SUH]HQWDFML
GRNXPHQWyZ:::RSLVXMF\FKZF]DV\ZEDVHQLH0RU]DUyG]LHPQHJR
3.
=DZDUWRüVHUZHUD:::rozrasta VL
ZUD]]XSá\ZHPF]DVX
*G\ GR LVWQLHMFHJR V\VWHPX GRGDZDQH V QRZH GRNXPHQW\ ::: SURMHNWDQW PXVL SRGMü
RGSRZLHG]LDOQGHF\]M RW\PZNWyU\FK]GRW\FKF]DVRZ\FKGRNXPHQWyZXPLHFLüáF]QLNL
GRQRZHMF] FLV\VWHPX
5R]ZL]DQLHP WHFKQLF]Q\P NWyUH DGUHVXMH SU]HGVWDZLRQH SUREOHP\ MHVW personalizacja
]DZDUWRFL VHUZHUyZ ::: 3HUVRQDOL]DFMD SROHJD QD Z\NRU]\VW\ZDQLX ZLHG]\ R SUHIHUHQFMDFK
X*\WNRZQLNyZGRG\QDPLF]QHJRGRVWRVRZ\ZDQLDZ\JOGXLVWUXNWXU\SU]HV\áDQ\FKGRNXPHQWyZ
']L NLWHPXND*G\X*\WNRZQLNPR*HRWU]\P\ZDüLQQ\REUD]]DZDUWRFLLVWUXNWXU\GRNXPHQWyZ
WHJR VDPHJR VHUZHUD :LHG]D R SUHIHUHQFMDFK X*\WNRZQLNyZ PR*H E\ü SR]\VNLZDQD jawnie,
SRSU]H] GRVWDUF]HQLH X*\WNRZQLNRP IRUPXODU]\ L QDU]
G]L R FKDUDNWHU]H NRQILJXUDF\MQ\P EG(
niejawnie – w wyniku obserwacji stylu ich dotychczasowej interakcji z serwerem. Wiele
VWRVRZDQ\FKG]LUR]ZL]DZ]DNUHVLHSHUVRQDOL]DFML]DZDUWRFLVHUZHUyZ:::VLOQLHED]XMHQD
LQIRUPDFMDFK X]\VNDQ\FK RG X*\WNRZQLND Z VSRVyE MDZQ\ 7DND IRUPD SR]\VNLZDQLD ZLHG]\
FHFKXMH VL GX* VXELHNW\ZQRFL L VSRW\ND VL ] QLHFK FL X*\WNRZQLNyZ Ä]PXV]DQ\FK´ GR
Z\SHáQLDQLD GRGDWNRZ\FK IRUPXODU]\ L DQNLHW 3RQDGWR WDN EXGRZDQH SURILOH X*\WNRZQLNyZ
SRVLDGDMFKDUDNWHUVWDW\F]Q\L]XSá\ZHPF]DVXXOHJDMGHJUDGDFML
:RVWDWQLFKODWDFKFRUD]ZL NV]XZDJ SU]\FLJDMPHWRG\SHUVRQDOL]DFML]DZDUWRFLVHUZHUyZ
:::SRSU]H]QLHMDZQHREVHUZRZDQLHWUHQGyZZ]DFKRZDQLDFKX*\WNRZQLNyZ::::SUDF\
30
Marek Wojciechowski, Maciej Zakrzewicz
[PE97], zaproponowano termin adaptatywne serwery WWW (adaptive web
VLWHVRSLVXMF\serwery
::: NWyUH DXWRPDW\F]QLH XOHSV]DM VZRM ]DZDUWRü L RUJDQL]DFM QD SRGVWDZLH REVHUZDFML
FLH*HN GRVW SyZ X*\WNRZQLNyZ. Idea adaptatywnych serwerów polega na analizie plików logu
VHUZHUD Z\áDZLDQLX ] QLFK VWDW\VW\F]Q\FK NRUHODFML SRPL G]\ SRELHUDQ\PL GRNXPHQWDPL OXE
SUDFXMF\PL X*\WNRZQLNDPL D QDVW SQLH Z\NRU]\VW\ZDQLX ]QDOH]LRQ\FK NRUHODFML GR EXGRZ\
VWUXNWXU\GRNXPHQWyZ:::Z\V\áDQ\FKX*\WNRZQLNRP:W\PDUW\NXOHRSLVXMHP\VWDQQDXNLL
technologii w zakresie metod konstrukcji adaptatywnych serwerów WWW.
2. Automatyczna adaptacja serwera WWW
Proces adaptacji serwera WWW przebiega w dwóch fazach:
1.
2IIOLQHZ\NRU]\VWDQLHSOLNXORJXVHUZHUDGR]QDOH]LHQLDLSRJUXSRZDQLDQDMF] VWV]\FKFLH*HN
QDZLJDF\MQ\FK X*\WNRZQLNyZ )D]D WD UHDOL]RZDQD MHVW DV\QFKURQLF]QLH Z]JO GHP SRáF]H
X*\WNRZQLNyZQSZRGVW SDFKW\JRGQLRZ\FKOXEPLHVL F]Q\FK
2.
2QOLQHZ\NRU]\VW\ZDQLH]QDOH]LRQ\FKJUXSFLH*HNQDZLJDF\MQ\FKGRWZRU]HQLDdynamicznych
rekomendacji
GOD X*\WNRZQLNyZ F]\OL ]ELRUX áF]QLNyZ GR GRNXPHQWyZ NWyU\PL FL
X*\WNRZQLF\E GQDMSUDZGRSRGREQLHMVWDW\VW\F]QLH]DLQWHUHVRZDQL)D]DWDMHVWUHDOL]RZDQD
SRGF]DVREVáXJLND*GHJR*GDQLDX*\WNRZQLND
3U]HGVWDZP\ SU]\NáDG SURVWHM DGDSWDFML VHUZHUD ::: ]LOXVWURZDQ\ QD U\VXQNX 6HUZHU
:::]RVWDáRGZLHG]RQ\SU]H]SL
FLXX*\WNRZQLNyZNWyU\FKSHáQHFLH*NLQDZLJDF\MQH]RVWDá\
zapisane w pliku logu. W pierwszej fazie adaptacji (offline) wykonywana jest analiza pliku logu i
]QDOH]LRQH ]RVWDM QDVW SXMFH FLH*NL F] VWH
books.html -> albums.html, books.html ->
ord.html, car.html -> radio.html
.D*GD]W\FKFLH*HNSRMDZLáDVL ZRGZLHG]LQRSLVDQ\FK
Z SOLNX ORJX L Z ]ZL]NX ] W\P E G RQH WUDNWRZDQH SU]H] QDV MDNR SUHIHUHQFMH GOD LQQ\FK
X*\WNRZQLNyZ:GUXJLHMID]LHRQOLQHQRZ\X*\WNRZQLNZ\V\áDGRVHUZHUD*GDQLHSU]HVáDQLD
dokumentu WWW (
books.html
6HUZHU SRELHUD GRNXPHQW ] G\VNX L SU]HJOGD ]QDOH]LRQ\
ZF]HQLHM ]ELyU FLH*HN F] VW\FK ± Z\QLND ] QLHJR *H X*\WNRZQLF\ NWyU]\ SRELHUDOL GRNXPHQW
books.html
E\OLSy(QLHM]DLQWHUHVRZDQLGRNXPHQWDPL
albums.html
i
ord.html
:]ZL]NX]W\PZ
FHOX XáDWZLHQLD QDZLJDFML GR GRNXPHQWX
books.html
G\QDPLF]QLH GRGDZDQH V áF]QLNL GR
SRZ\*V]\FKGRNXPHQWyZ7DN]PRG\ILNRZDQ\GRNXPHQWWUDILDGRX*\WNRZQLNDU\VXQHN
homepage.html -> books.html -> products.html -> albums.html -> ord.html
car.html -> books.html -> radio.html -> cdisks.html
books.html -> download.html -> ord.html -> albums.html -> contact.html
homepage.html -> car.html -> demo.html -> depts.html
car.html -> download.html -> radio.html
...
3OLNORJXVHUZHUD
=QDMGRZDQLHFLH*HN
F] VW\FK
books.html -> albums.html
books.html -> ord.html
car.html -> radio.html
3UHIHUHQFMH
books.html -> albums.html
books.html -> ord.html
car.html -> radio.html
3UHIHUHQFMH
)GDQLHX*\WNRZQLND
http://x.com/books.html
7UDQVIRUPDFMD
GRNXPHQWX
books.html
'RNXPHQW
books.html
áF]QLNGRalbums.html
áF]QLNGRord.html
=PRG\ILNRZDQ\GRNXPHQW
5\V3U]\NáDGRZ\SURFHVDGDSWDFMLVHUZHUD:::
Adaptatywne serwery WWW
31
'\QDPLF]QLHGRGDQHáF]QLNLGR
GRNXPHQWyZNWyU\PL
QDMSUDZGRSRGREQLHMMHVW
]DLQWHUHVRZDQ\X*\WNRZQLN
'RW\FKF]DVRZD]DZDUWRüGRNXPHQWX
5\V3U]\NáDG\GRNXPHQWyZ:::Z]ERJDFRQ\FKRG\QDPLF]QHUHNRPHQGDFMH
2.1 Faza Offline algorytmu
Struktura pliku logu
,QIRUPDFMH R GRVW SDFK GR VHUZHUD ::: ]DSLV\ZDQH V Z ORJX 'OD ND*GHJR GRVW SX GR
SRMHG\QF]HJRSOLNX]QDMGXMFHJRVL QDVHUZHU]HZORJXSRMDZLDVL QRZ\]DSLV-HGQDN*HLORü
LQIRUPDFML SDPL WDQD Z ]ZL]NX ] GDQ\P GRVW SHP PR*H E\ü Uy*QD Z SU]\SDGNX Uy*Q\FK
VHUZHUyZ ::: $E\ XPR*OLZLü WZRU]HQLH XQLZHUVDOQ\FK QDU] G]L VáX*F\FK GR DQDOL]\ ORJX
SRMDZLá\VL SUyE\VWDQGDU\]DFMLMHJRIRUPDWX']LVLDMPR*QD]DáR*\ü*HSU]HZD*DMFDZL NV]Rü
VHUZHUyZ ::: JHQHUXMH SOLNL ORJX ]JRGQH ] IRUPDWHP ]QDQ\P SRG QD]Z Common Logfile
Format
>/@1LHMHVWWRMHGQDNZSHáQLRERZL]XMF\VWDQGDUGJG\*QLHNWyUHVHUZHU\SDPL WDM
UyZQLH* SHZQH GRGDWNRZH LQIRUPDFMH VWDQGDUG XLF). Common Logfile Format SU]HZLGXMH *H
]DSLVZORJXSRZLQLHQPLHüQDVW SXMFSRVWDü
remotehost rfc931 authuser [date] ”request” status bytes
: SRZ\*V]\P IRUPDFLH SROH remotehost R]QDF]D QD]Z OXE DGUHV ,3 NRPSXWHUD ] NWyUHJR
QDVWSLáR RGZRáDQLH 3ROH rfc931 ]DZLHUD QD]Z
X*\WNRZQLND QD GDQ\P NRPSXWHU]H ang.
logname). Pole authuser
]DZLHUD LQIRUPDFM R W\P ]D NRJR X*\WNRZQLN VL SRGDMH 3ROH >date]
LQIRUPXMHRW\PNLHG\QDVWSLáRRGZRáDQLHGDWDLF]DV3ROH´request´]DZLHUD*GDQLHSU]HVáDQH
GRVHUZHUDZWDNLHMIRUPLHZMDNLHMZ\JHQHURZDáMHNOLHQW2EHMPXMHRQRQDRJyáW\SRSHUDFMLL
QD]Z SOLNX GR NWyUHJR QDVWSLáR RGZRáDQLH ZUD] ]H FLH*N GRVW SX 3ROH status zawiera
]ZUDFDQ\NOLHQWRZLNRGVWDWXVX]JRGQLH]SURWRNRáHP+773Z\NRU]\VW\ZDQ\PZXVáXG]H:::
'áXJRü ]DZDUWRFL SU]HV\áDQHJR GRNXPHQWX SDPL
WDQD MHVW Z SROX bytes 3U]\NáDG ]DZDUWRFL
pliku logu serwera WWW przedstawiono na rysunku 3.
154.11.231.17 - - [13/Jul/2000:20:42:25 +0200] "GET / HTTP/1.1" 200 1673
154.11.231.17 - - [13/Jul/2000:20:42:25 +0200] "GET /apache_pb.gif HTTP/1.1" 200 2326
32
Marek Wojciechowski, Maciej Zakrzewicz
192.168.1.25 - - [13/Jul/2000:20:42:25 +0200] "GET /demo.html HTTP/1.1" 200 520
192.168.1.25 - - [13/Jul/2000:20:42:25 +0200] "GET /books.html HTTP/1.1" 200 3402
160.81.77.20 - - [13/Jul/2000:20:42:25 +0200] "GET / HTTP/1.1" 200 1673
154.11.231.17 - - [13/Jul/2000:20:42:25 +0200] "GET /car.html HTTP/1.1" 200 2580
192.168.1.25 - - [13/Jul/2000:20:42:25 +0200] "GET /cdisk.html HTTP/1.1" 200 3856
10.111.62.101 - - [13/Jul/2000:20:42:25 +0200] "GET /new/demo.html HTTP/1.1" 200 971
5\V3U]\NáDGRZ\SOLNORJXVHUZHUD:::
,GHQW\ILNDFMDFLH*HNQDZLJDF\MQ\FK
=SXQNWXZLG]HQLDDQDOL]\LVWRWQ\PLLQIRUPDFMDPLZORJXVHUZHUD:::VQD]ZDOXEDGUHV
,3 NRPSXWHUD ] NWyUHJR QDVWSLáR RGZRáDQLH QD]ZD X*\WNRZQLND GRNRQXMFHJR RGZRáDQLD
GRNáDGQDGDWDLF]DVRUD]SHáQDQD]ZDSOLNXNWyUHJRGRW\F]\áR*GDQLH$QDOL]DSOLNyZORJXSROHJD
QD ]QDMGRZDQLX F] VWR SRZWDU]DMF\FK VL VHNZHQFML ZFLH*NDFK GRVW SyZ X*\WNRZQLNyZ GR
VHUZHUD ::: OXE QD JUXSRZDQLX X*\WNRZQLNyZ Z\ND]XMF\FK SRGREQH ]DFKRZDQLH = WHJR
SRZRGXNRQLHF]Q\PHWDSHPZVW SQHMREUyENLGDQ\FK]DZDUW\FKZORJXMHVWJUXSRZDQLH]DSLVyZ
GRW\F]F\FKRGZRáDWHJRVDPHJRX*\WNRZQLND*UXSRZDQLHWRRGE\ZDVL QDSRGVWDZLHDGUHVX,3
OXEQD]Z\NRPSXWHUDRUD]QD]Z\X*\WNRZQLND1LHVWHW\QLH]DZV]HQD]ZDX*\WNRZQLNDMHVW]QDQD
6\WXDFMDWDNDPDPLHMVFHF] VWRZSU]\SDGNXJG\X*\WNRZQLNNRU]\VWD]V\VWHPXRSHUDF\MQHJR
NWyU\ QLH ]DNáDGD ZLHORGRVW SX 1D V]F] FLH IDNW *H ] NRPSXWHUD SUDFXMFHJR SRG NRQWURO
V\VWHPX RSHUDF\MQHJR NWyU\ QLH MHVW ZLHORGRVW SQ\ PR*H Z GDQHM FKZLOL NRU]\VWDü W\ONR MHGHQ
X*\WNRZQLNSR]ZDODWUDNWRZDüRGZRáDQLDSRFKRG]FH]WHJRVDPHJRNRPSXWHUDMDNRRGZRáDQLD
MHGQHJRX*\WNRZQLNDJG\QD]ZDX*\WNRZQLNDQLHMHVW]QDQD2F]\ZLFLHSRZ\*V]H]DáR*HQLHMHVW
SRSUDZQH W\ONR Z SU]\SDGNX RGZRáD NWyU\FK F]DV\ ]DZLHUDM VL ZRNUHVLH RGSRZLDGDMF\P
PR*OLZHPXF]DVRZLWUZDQLDSRMHG\QF]HMVHVMLX*\WNRZQLND0HFKDQL]PWHQQLHSR]ZDODZL FQD
LGHQW\ILNDFM VHNZHQFMLGRVW SyZZUDPDFKZLHOXVHVMLX*\WNRZQLNDQDSU]HVWU]HQLQSPLHVLFD
JG\*]GDQHJRNRPSXWHUDPR*HZUy*Q\FKJRG]LQDFKNRU]\VWDüZLHOHRVyE
=HZ]JO GXQDIDNW*HX*\WNRZQLNPR*HZLHORNURWQLHNRU]\VWDü]XVáXJGDQHJRVHUZHUD:::
]DND*G\PUD]HPV]XNDMFLQQ\FKLQIRUPDFMLQLHNLHG\ZVND]DQHMHVWUR]ELFLHVHNZHQFMLGRVW SyZ
GDQHJR X*\WNRZQLND QD IUDJPHQW\ RGSRZLDGDMFH SRV]F]HJyOQ\P VHVMRP 1LH MHVW WR MHGQDN
]DGDQLHWU\ZLDOQHJG\*SURWRNyáKWWSQLHSRVáXJXMHVL SRM FLHPVHVML1DMSURVWV]HUR]ZL]DQLHWHJR
SUREOHPXSROHJDQDZ\RGU EQLDQLXVHVMLX*\WNRZQLNyZZRSDUFLXR]DáR*HQLH*HMHOLF]DVPL G]\
NROHMQ\PL RGZRáDQLDPL GR VHUZHUD MHVW ]QDF]QLH GáX*V]\ QL* W\SRZ\ F]DV SU]HJOGDQLD MHGQHM
VWURQ\ WR RGZRáDQLD WH QDVWSLá\ Z UDPDFK GZyFK Uy*Q\FK VHVML $OWHUQDW\ZQ\P UR]ZL]DQLHP
PR*HE\üUR]V]HU]HQLHIXQNFMRQDOQRFLVHUZHUDRREVáXJ LGHQW\ILNDWRUyZVHVMLQDF]DV]ELHUDQLD
LQIRUPDFMLR]DFKRZDQLDFKX*\WNRZQLNyZ><-*@
&HOHPDQDOL]\SOLNyZORJXVHUZHUD:::PR*HE\ü]QDMGRZDQLHF] VW\FKFLH*HNQDZLJDFML
OXE ]QDMGRZDQLH JUXS VWURQ GR NWyU\FK X*\WNRZQLF\ F] VWR RGZRáXM VL Z UDPDFK VHVML :
SLHUZV]\P SU]\SDGNX LVWRWQH V LQIRUPDFMH R ZV]\VWNLFK VWURQDFK GR NWyU\FK RGZRá\ZDá VL
X*\WNRZQLN]XZ]JO GQLHQLHPNROHMQRFLRGZRáD:SR]RVWDá\FKSU]\SDGNDFKPR*HVL RND]Dü
*H LVWRWQH V RGZRáDQLD W\ONR GR W\FK VWURQ NWyU\FK WUHü ]DLQWHUHVRZDáD X*\WNRZQLND VWURQ\
VáX*FHMHG\QLHMDNRFLH*NDGRVW SXGRV]XNDQHJRGRNXPHQWXQLHVXZ]JO GQLDQH:>&06@
]DSURSRQRZDQR SRG]LDá RGZRáD GR VWURQ QD ]RULHQWRZDQH QD ]DZDUWRü L zorientowane na
QDZLJDFM 1LHNWyUH VWURQ\ ]DZLHUDM JáyZQLH RGQRQLNL GR LQQ\FK VWURQ Z ]ZL]NX ] czym
RGZRáDQLD GR QLFK QD SHZQR E G PLDá\ FKDUDNWHU QDZLJDF\MQ\ -HGQDN*H ZLHOH VWURQ ]DZLHUD
]DUyZQRWUHüMDNLRGQRQLNLGRLQQ\FKVWURQ7DNLHVWURQ\PRJUy*Q\PX*\WNRZQLNRPVáX*\üGR
Uy*Q\FKFHOyZ'ODWHJRUR]VGQ\PNU\WHULXPSRG]LDáXGRVW SyZQD]RULHQWRZDQHQDQDZLJDFM L
]DZDUWRü Z\GDMH VL
F]DV QD MDNL X*\WNRZQLN ]DWU]\PXMH VL
QD GDQHM VWURQLH E\ü PR*H
znormalizowany w
VWRVXQNX GR UR]PLDUX VWURQ\ &]DV SU]HJOGDQLD GDQHM VWURQ\ MHVW REOLF]DQ\
Adaptatywne serwery WWW
33
MDNR Uy*QLFD HW\NLHW F]DVRZ\FK GZyFK NROHMQ\FK ]DSLVyZ Z ORJX RGSRZLDGDMF\FK QDVW SQHM L
ELH*FHM VWURQLH : SU]\SDGNX VWURQ NRF]F\FK VHVM X*\WNRZQLND SU]\MPXMH VL *H GRVW S GR
QLFKPLDáPLHMVFH]HZ]JO GXQDLFK]DZDUWRüFKRüRF]\ZLFLHZNRQNUHWQ\PSU]\SDGNXZFDOH
QLHPXVLWRE\üSUDZG
Problemy obróbki plików logu
,QIRUPDFMH]DZDUWHZORJXPRJE\üQLHW\ONRQLHSHáQHDOHUyZQLH*]DIDáV]RZDQH]HZ]JO GX
na wykorzystywanie serwerów
SUR[\LSRGU F]QHMSDPL FLSU]HJOGDUHN>3@6HUZHUSUR[\VáX*\
MDNR ÄRNQR QD ZLDW´ GOD ZLHOX NRPSXWHUyZ SR]ZDODMF X]\VNDü GRVW S GR ,QWHUQHWX
X*\WNRZQLNRP QD QLFK SUDFXMF\P =DSLV\ Z ORJX VHUZHUD ::: RGSRZLDGDMFH RGZRáDQLRP
X*\WNRZQLNyZNRPSXWHUyZÄXNU\W\FK´]DVHUZHUHPSUR[\VRSLVDQHDGUHVHPVHUZHUDproxy. W
]ZL]NX]W\PIDNW*HNLOND]DSLVyZZORJXGRW\F]\MHGQHJRDGUHVX,3QLHPXVLZFDOHR]QDF]Dü
L* ]DSLV\ WH RGSRZLDGDM RGZRáDQLRP ] WHJR VDPHJR NRPSXWHUD : >335@ ]DSURSRQRZDQR
PHWRG Z\NU\ZDQLD WDNLFK V\WXDFML Z RSDUFLX R]DáR*HQLH *H MHOL GDQH RGZRáDQLH GRW\F]\
GRNXPHQWX GR NWyUHJR QLH PD áF]D ZSRSU]HGQLR *GDQ\P GRNXPHQFLH WR SUDZGRSRGREQLH
*GDQLD V NLHURZDQH SU]H] GZyFK Uy*Q\FK X*\WNRZQLNyZ 0LPR *H GRZLDGF]HQLD SRND]XM
>&3@ L* GRVW S GR NROHMQHJR GRNXPHQWX MHVW QDMF] FLHM Z\QLNLHP Z\EUDQLD GRVW SQHJR Z
GRNXPHQFLHáF]Dang. hyperlink) lub powrotem do poprzedniego dokumentu (operacja „Back”),
ZVSRPQLDQD PHWRGD QLH JZDUDQWXMH *DGQHM SHZQRFL 'ODWHJR GOD FHOyZ LGHQW\ILNDFML
X*\WNRZQLNyZ VWRVXMH VL W]Z cookies OXE GRGDWNRZ DXWRU\]DFM Cookie jest identyfikatorem
JHQHURZDQ\PSU]H]VHUZHULSU]HV\áDQ\PGRNOLHQWDSU]HJOGDUNLZFHOXSy(QLHMV]HMLGHQW\ILNDFML
X*\WNRZQLND1LHGRVNRQDáRüWHJRPHFKDQL]PXZ\QLND]IDNWX*HX*\WNRZQLF\PRJZGRZROQHM
FKZLOL XVXQü FRRNLH OXE Z RJyOH ]DEURQLü DNFHSWDFML cookies. Dodatkowa identyfikacja
X*\WNRZQLNyZSRSU]H]*GDQLHZ\SHáQLHQLDIRUPDWNLUHMHVWUDF\MQHMUyZQLH*Z\PDJDGREUHMZROL
X*\WNRZQLNyZJG\*GDQHSU]H]QLFKSRGDZDQHPRJE\üSU]HFLH*IDáV]\ZH
5yZQLH LVWRWQ\P SUREOHPHP MDN LGHQW\ILNDFMD X*\WNRZQLNyZ MHVW LGHQW\ILNDFMD IDNW\F]Q\FK
RGZRáD GR GRNXPHQWyZ =H Z]JO GX QD VWRVRZDQLH SU]H] SU]HJOGDUNL SDPL FL SRGU F]QHM
NROHMQHRGZRáDQLDGDQHJRX*\WNRZQLNDGRWHJRVDPHJRGRNXPHQWXPRJQLHE\üRGQRWRZDQHQD
VHUZHU]H JG\* PRJ E\ü ]UHDOL]RZDQH SU]H] VSURZDG]HQLH GRNXPHQWX ]SDPL FL SRGU F]QHM
SU]HJOGDUNL D QLH ] VHUZHUD : VSRVyE ]QDF]F\ PR*H WR ]DNáyFLü RGNU\WH FLH*NL QDZLJDFML
X*\WNRZQLNyZ -HV]F]H SRZD*QLHMV]\ SUREOHP Z\QLND ]H VWRVRZDQLD SDPL
FL SRGU
F]QHM SU]H]
serwery
SUR[\-HOLX*\WNRZQLNNRU]\VWDMF\] Internetu poprzez serwer SUR[\RGZRáXMHVL GR
GRNXPHQWX ]QDMGXMFHJR VL Z SDPL FL SRGU F]QHM SUR[\ VHUZHU ::: PR*H E\ü Z RJyOH
QLHZLDGRP\ *H GDQ\ X*\WNRZQLN RGZRá\ZDá VL GR GDQHJR GRNXPHQWX $E\ REURQLü VL SU]HG
ZVSRPQLDQ\PL V\WXDFMDPL VHUZHU\ ::: PRJ VWRVRZDü WHFKQLNL ]DSRELHJDMFH
Z\NRU]\VW\ZDQLXSDPL FLSRGU F]QHMRNUHODQHMDNRcache-bustingSROHJDMFHQSQDSRGDZDQLX
GDW ] SU]HV]áRFL MDNR WHUPLQyZ XSá\QL FLD ZD*QRFL SRV]F]HJyOQ\FK GRNXPHQWyZ 7HJR W\SX
WHFKQLNLPRJE\üXFL*OLZHGODX*\WNRZQLNyZJG\*Z\GáX*DMF]DVRGSRZLHG]L=WHJRZ]JO GX
SRMDZLá\VL SURSR]\FMHDE\]DPLDVWPRQLWRURZDQLDZV]\VWNLFKGRVW SyZGRVHUZHUDRJUDQLF]\ü
VL W\ONRGRSHZQHMSUyENLVWDW\VW\F]QHMLQDMHMED]LHGRNRQ\ZDüDQDOL]
Czyszczenie plików logu
3URFHV ZVW SQHM REUyENL GDQ\FK QLH NRF]\ VL QD LGHQW\ILNDFML RGZRáD SRV]F]HJyOQ\FK
X*\WNRZQLNyZ=DSLV\ZORJXGRW\F]SRMHG\QF]\FKSOLNyZDQLHGRNXPHQWyZWUDNWRZDQ\FKMDNR
RELHNW\]áR*RQH:SU]\SDGNXGRVW SXGRVWURQ\]DZLHUDMFHMQSREUD]\G(ZL NLOXEILOP\ZORJX
]QDMG]LHVL ]DSLVGRW\F]F\JáyZQHJRGRNXPHQWXQDMF] FLHM]UR]V]HU]HQLHPhtml lub htm), ale
WDN*H]DSLV\]ZL]DQH]HZV]\VWNLPLRELHNWDPL]DJQLH*G*RQ\PLZ stronie (obrazami, filmami, itp.).
1D V]F] FLH FKDUDNWHU SOLNX PR*QD Z GX*\P VWRSQLX Z\ZQLRVNRZDü ] MHJR UR]V]HU]HQLD
3U]\NáDGRZHUR]V]HU]HQLDQD]ZSOLNyZRGSRZLDGDMFHRELHNWRP]DJQLH*G*DQ\PZGRNXPHQWDFK
to jpg, jpeg, gif dla obrazów, au, wav
GODG(ZL NyZavi, movGODILOPyZ$E\GDQH(UyGáRZHGR
DQDOL]]DZLHUDá\W\ONRLQIRUPDFMHRGRVW SDFKGRLVWRWQ\FKGRNXPHQWyZQDOH*\SRGGDüSOLNORJX
VHUZHUD:::SURFHVRZLILOWUDFMLZZ\QLNXNWyUHJRLJQRURZDQHV]DSLV\GRW\F]FHSOLNyZQLH
E GF\FKJáyZQ\PLGRNXPHQWDPLRGSRZLDGDMF\PLW]ZVWURQRP:::ang. Web page).
34
Marek Wojciechowski, Maciej Zakrzewicz
2GNU\ZDQLHSUHIHUHQFMLX*\WNRZQLNyZ
3UHIHUHQFMHX*\WNRZQLNyZVUHSUH]HQWRZDQHSU]H]]ELRU\SRGREQ\FKQDMF]
FLHMVWRVRZDQ\FK
FLH*HNQDZLJDF\MQ\FK:FHOX]QDOH]LHQLDSUHIHUHQFMLUHDOL]RZDQ\MHVWGZXID]RZ\DOJRU\WP
1.
3U]HV]XNDM ORJ VHUZHUD ::: Z FHOX ]QDOH]LHQLD ZV]\VWNLFK QDMF] FLHM Z\VW SXMF\FK
FLH*HNQDZLJDF\MQ\FK
2.
3RJUXSXM ]QDOH]LRQH FLH*NL QDZLJDF\MQH NLHUXMF VL LFK ZVSyáVWRVRZDQLHP SU]H]
X*\WNRZQLNyZ W]Q SRGRELHVWZR GZyFK FLH*HN Z\QLND ] WHJR L* ZLHOX X*\WNRZQLNyZ
NWyU]\SRG*DMMHGQ]QLFKSRG*DUyZQLH*GUXJ
2.2. Faza Online algorytmu
2GFKZLOLSLHUZV]HJRSRGáF]HQLDVL X*\WNRZQLNDGRVHUZHUD:::ZV]\VWNLHRSHUDFMHWHJR
X*\WNRZQLNDVUHMHVWURZDQHZIRUPLHW]Zhistorii sesji=DND*G\PUD]HPNLHG\X*\WNRZQLN*GD
SU]HVáDQLD GRNXPHQWX KLVWRULD MHJR VHVML MHVW GRSDVRZ\ZDQD GR LVWQLHMF\FK JUXS FLH*HN
QDZLJDF\MQ\FK L Z\ELHUDQH V WH JUXS\ NWyUH Z\ND]XM VL QDMZL NV]\P GRSDVRZDQLHP =ELyU
áF]QLNyZ GR GRNXPHQWyZ RSLVDQ\FK Z FLH*NDFK QDZLJDF\MQ\FK GRSDVRZDQ\FK JUXS VWDMH VL
GRGDWNRZ\P HOHPHQWHP ZL]XDOQ\P NWyU\ G\QDPLF]QLH MHVW GRáF]DQ\ GR *GDQHJR GRNXPHQWX
[YJG+96].
3. Podsumowanie
: DUW\NXOH SU]HGVWDZLRQR DUFKLWHNWXU V\VWHPX DXWRPDW\F]QHM SHUVRQDOL]DFML ]DZDUWRFL
VHUZHUyZ:::NWyU\XPR*OLZLDWZRU]HQLHURGRZLVN:::GRSDVRZXMF\FKVL GR]DFKRZD
X*\WNRZQLNyZ 2EHFQLH Z ,QVW\WXFLH ,QIRUPDW\NL 3ROLWHFKQLNL 3R]QDVNLHM UR]ZLMDQ\ MHVW PRGXá
UR]V]HU]DMF\IXQNFMRQDOQRü Oracle Application 6HUYHUD R WDN UR]XPLDQ DGDSWDW\ZQRü ']L NL
]DVWRVRZDQLX]DSUH]HQWRZDQHMILOR]RILLF] üRGSRZLHG]LDOQRFL]DZ\JOGLVWUXNWXU ]DZDUWRFL
VHUZHUD:::MHVWSU]HQRV]RQD]SURMHNWDQWyZQDX*\WNRZQLNyZ
Literatura
[CP95]
Catledge L.D., Pitkow J.E., ”Characterizing Browsing Strategies in the World Wide Web”, Proc. of
the 3rd Int’l World Wide Web Conference, 1995.
[CM99]
Cooley, R., Mobaser, B., Srivastava, J., „Data preparation for mining World Wide Web browsing
patterns”, Journal of Knowledge and Information Systems, 1, 1999.
[CMS97]
Cooley R., Mobasher B., Srivastava J., “Grouping Web Page References into Transactions for
Mining World Wide Web Browsing Patterns”, Proc. of the 1997 IEEE Knowledge and Data
Engineering Exchange Workshop (KDEX), Newport Beach, California, November 1997.
[H75]
Hartigan J., Clustering Algorithms, John Wiley, 1975.
[HKM97]
Han, E-H, Karypis, G., Kumar, V., Mobasher, B., “Clustering based on association rule
hypergraphs”, Proc. of SIGMOD’97 Workshop on Research Issues in Data Mining and
Knowledge Discovery (DMKD’97), May, 1997.
[L95]
Luotonen A., “The common log file format”, http://www.w3.org/pub/WWW/, 1995.
[PPR96]
Pirolli P., Pitkow J., Rao R., “Silk From a Sow’s Ear: Extracting Usable Structure from the World
Wide Web”, Conference on Human Factors in Computing Systems (CHI 96), Vancouver, British
Columbia, Canada, 1996.
Adaptatywne serwery WWW
35
[P97]
Pitkow J., “In search of reliable usage data on the www”, Sixth Int’l World Wide Web Conference,
Santa Clara, California, 1997.
[YJG+96]
Yan T.W., Jacobsen M., Garcia-Molina H., Dayal U., “From User Access Patterns to Dynamic
Hypertext Linking”, Proc. of the 5th Int’l World Wide Web Conference, 1996.
[PE97]
Perkowitz, M., Etzioni, O., “Adaptive Web Sites: an AI challenge”, Proc. 15
th
Int. Joint Conf. AI,
1997.