240 Jarosław Gramacki. Artur Gramacki
Tabela 10. Przykładowy dokument, który należy podsumować: Wprowadzenie do systemu Oracle Text. 26 zdań w języku angielskim. Tekst pokazano z wyraźnym zaznaczeniem wchodzących w jego skład zdań
1 |
Oracle Text, Oracle‘s integrated full-text retrieval technolog} , is part of the Oradei lg Standard and Enterprise Edi-tions. |
2 |
Oracle Text uses standard SQL to index, search, and analyze text and documents stored in the Oracle database, in files. and on the Web. |
3 |
Oracle Text can perform linguistic analysis on documents: search text using a variety of stratcgies including keyword searching. contextual ąueries. Boolean operations, pattem inatching. inixed thernatic queries, HTML/XML section searching. etc. |
4 |
Oracle Text excels at mixed queries. those tliat involve stnictured relational attributes as well as text. |
5 |
Oracle Text can render search results in various fonnats including unfomiatted text. HTML with term higlrlighting. and original document format. |
6 |
Oracle Text supports multiple languages and uses advanced relevance-ranking technolog} to improve search quality. |
7 |
Over the last decade, organizations liave imested heavily in systems that enable rapid access to stnictured data stored in database systems. |
8 |
However, this data represents a fraction of all corporate information. |
9 |
A far larger volume exists as text - in documents. web pages. manuals. rcports. email. faxes. and presentations. |
10 |
Tliese valuable sources of business infonnation are oftcn inaccessiblc and not managcd in a cost-effective manner. |
11 |
Users accessing organization infonnation - w hether they are employees visiting an intranet portal or btiyers browsing a catalog - need sophisticated support from text search infrastructure to find what they want. |
12 |
Text is undemtilized in many organizations. |
13 |
Text assets are no longer static. physical entities. |
14 |
Current technology allows companies to create globally interconnected systems that storę text information drawn from many sources. |
15 |
Important text assets may be hidden because it's difficult to find them. |
16 |
Poor search quality is expensive. |
17 |
nlocking the value of an organization*s textual infonnation has been a long tenn challenge. |
18 |
Historically. text has been seen to require a different set of technologies for retrieval and management than other business data. |
19 |
This misperception has burdened organizations with multiple storage and retrieval systems. and also multiple devel-oprncnl emironments. |
20 |
This has stood in the way of effectively integrating all of the corporation’s information assets. |
21 |
As a legacy of this misperception. many companies today buy different products for soh ing their text searching needs and their structured data (database) searching needs. |
22 |
Not only is this approach costly over the lifecycle of purchasing, integrating. operating and maintaining different products. but it also results in poor performance and a high latency in development of applications. |
23 |
Further. puryeyors of specialty servers can seldoin deliver the high reliability. throughput and multi-platform scal-ability of an enterprise database. |
24 |
Wliat if it were possible to extend the power and advantages of relational database systems to all corporate information. including text and other unstructured data ? |
25 |
Aftcr all. text data is real data tliat warrants the infrastmcture of a real database and proven tools for application development. |
26 |
In this w lute paper. we look at such an approach in the fonn of Oracle Text. |
Zawartość każdego podsumowywanego dokumentu można przedstawić w postaci macierzy TDM opisywanej we wcześniejszych rozdziałach. Różnica jest tylko taka, że macierz, zamiast dokumentów, w kolumnach zawierać będzie zdania8. Macierz taką nazywa się macierzą term-zdanie (ang. Term Sentence Matrix, TSM). Jest ona punktem wyjścia do dalszej analizy. Indeksuje
Oczywiście większe dokumenty można podsumowywać poprzez wybór większych ich fragmentów. przykład akapitów.