AD

View an example from the development of comprehensive search

Recently read "Expert one on one Oracle" first edition, Tom Kyte in the "Tuning Strategies" This chapter cites a practical example of how he targeted design DB. This example is a very real demand, and summed up with the time to develop better practices, to share in here.

This example is one sentence brief description of the "Comprehensive search." Assume that there are N tables, each table has many fields, of which we are only interested in certain fields, the demand is the user input some of the clues, the system lists all matching records. For example, users in an integrated search box, enter "tom", the system may return as the creator of "Tom Wang" or update those as "Tommy" the documents, also may return Email to "[email protected]**" user and name "tom and cat" items. That is, in many forms, and more fields on the same keywords; general, do not care about the location and spelling of keywords.

How to quickly return the result? Tom Kyte cited in the book is a lightweight example, only one table, 67 fields, 75,000 lines, and the source table is read-only. In order to index this table as soon as possible, he used some special treatment. Taking into account the writing of this version of the year, its handling is very clever. Let's create a similar table, and see how it is handled.

create sequence INC
start with 1
increment by 1
cache 200;

create table objects as select object_name from all_objects;
alter table objects add oid number(10);
update objects set oid =inc.nextval;
alter table objects  add constraint PK_OBJECTS primary key (OID);

create table x
(
 c1 varchar2(30),
 c2 varchar2(30),
 c3 varchar2(30),
 c4 varchar2(30),
 c5 varchar2(30),
 c6 varchar2(30),
 c7 varchar2(30),
 c8 varchar2(30),
 c9 varchar2(30),
 c10 varchar2(30),
 d1 char(1024), 
 d2 char(1024),
 d3 char(1024),
 d4 char(1024),
 d5 char(1024)
);

DECLARE
  i INTEGER;
  s INTEGER := 50000;
  m INTEGER;
BEGIN
  select count(0) into m from objects;

  FOR i IN 1 .. s LOOP
    INSERT INTO x
      (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, d1, d2, d3, d4, d5)
    VALUES
      ((SELECT object_name
         FROM objects
        WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       (SELECT object_name
          FROM objects
         WHERE OID = MOD(abs(dbms_random.random), m)),
       lpad('*', 1023, '*'),
       lpad('*', 1023, '*'),
       lpad('*', 1023, '*'),
       lpad('*', 1023, '*'),
       lpad('*', 1023, '*'));
  END LOOP;
END;
/



insert into x select * from x;

commit;


This creates a X table, a total of 10 million lines, of which c1 to c10 is the need to be to find out, d1 to d5 larger, entirely placeholder, the role of a real line of the size of the simulation.

How to find all contain a certain content, such as "E548D7A9" the bank? The easiest way is to find the effect is clearly very poor, in my machine (E6550, 2G), the following query took about 10 seconds before they finish. This is the single-user case, it can be said is totally unacceptable.

SELECT *
  FROM x
 WHERE UPPER(C1) LIKE '%E548D7A9%'
    OR UPPER(C2) LIKE '%E548D7A9%'
    OR UPPER(C3) LIKE '%E548D7A9%'
    OR UPPER(C4) LIKE '%E548D7A9%'
    OR UPPER(C5) LIKE '%E548D7A9%'
    OR UPPER(C6) LIKE '%E548D7A9%'
    OR UPPER(C7) LIKE '%E548D7A9%'
    OR UPPER(C8) LIKE '%E548D7A9%'
    OR UPPER(C9) LIKE '%E548D7A9%'
    OR UPPER(C10) LIKE '%E548D7A9%';

For each field with the index will quickly point?

create index IX_X_C1 on X (UPPER(C1));
create index IX_X_C2 on X (UPPER(C2));
create index IX_X_C3 on X (UPPER(C3));
create index IX_X_C4 on X (UPPER(C4));
create index IX_X_C5 on X (UPPER(C5));
create index IX_X_C6 on X (UPPER(C6));
create index IX_X_C7 on X (UPPER(C7));
create index IX_X_C8 on X (UPPER(C8));
create index IX_X_C9 on X (UPPER(C9));
create index IX_X_C10 on X (UPPER(C10));

Results and have almost no difference, or 10s. In fact, since both ends of LIKE is ambiguous, use the price index is too high, look at the implementation plan will be found, Oracle remains a choice of X table full table scan, but will not use IX_X_C? Do fast full scan. These indexes were white.

While scanning so many out more slowly, in the X, a redundancy of information stored would not need to search better?

alter table X add cs varchar2(400);
UPDATE X
   SET cs = upper(C1) || '|' || upper(C2) || '|' || upper(C3) || '|' ||
            upper(C4) || '|' || upper(C5) || '|' || upper(C6) || '|' ||
            upper(C7) || '|' || upper(C8) || '|' || upper(C9) || '|' ||
            upper(C10);
commit;
SELECT *
  FROM x
 WHERE CS LIKE '%E548D7A9%';

Unfortunately, little change, or 10S or so. Similarly, even in the cs add index will not be used.

The problem is where?

Tom Kyte in his book that the problem here is not a full table scan. In the current framework, the full table scan is inevitable. The problem is the table too. Since d1 ~ d5 existence, each line contains a lot of content does not need to be searched, in full table scan will be read when the contents of SGA, this will cause the so-called "cache thrashing": each search for each block are are physical read, the data will be out of the back in front of the cache, the cache is broken.

Found the problem, the appropriate response is to reduce the amount of data to be cached, so that all data can be cached. Tom's approach is to build a specific query another table, and set the cache option, specifically optimized for the full table scan.

 CREATE TABLE FAST_X
PCTFREE 0
CACHE
AS
SELECT upper(C1)||'|'||upper(C2)
||'|'||upper(C3)
||'|'||upper(C4)
||'|'||upper(C5)
||'|'||upper(C6)
||'|'||upper(C7)
||'|'||upper(C8)
||'|'||upper(C9)
||'|'||upper(C10) CONTENT,
ROWID ROW_ID
FROM X;

Try the query with the following statement, the effect is very good, basic is about 1s, can be regarded as a

SELECT *
  FROM X
 WHERE ROWID IN (SELECT ROW_ID FROM FAST_X WHERE CONTENT LIKE '%E548D7A9%')

As fast_x cache in the buffer, avoiding the disk i / o, full table scan under such magnitude is not as scary.

-------------------------------------------------- -------------------------------------------------- ----------------------------

These are given in the book tom solution. If your data and its like, then this method is worth learning from.

However, with the increase in the amount of data, this method will gradually show the deficiencies. Let's see what solution.

Or this form, if not the 10w line, but the 80w line, this method can also be right?

insert into x select * from x;
insert into x select * from x;
insert into x select * from x;
commit;

drop table FAST_X;
 CREATE TABLE FAST_X
PCTFREE 0
CACHE
AS
SELECT upper(C1)||'|'||upper(C2)
||'|'||upper(C3)
||'|'||upper(C4)
||'|'||upper(C5)
||'|'||upper(C6)
||'|'||upper(C7)
||'|'||upper(C8)
||'|'||upper(C9)
||'|'||upper(C10) CONTENT,
ROWID ROW_ID
FROM X;

SELECT *
  FROM X
 WHERE ROWID IN (SELECT ROW_ID FROM FAST_X WHERE CONTENT LIKE '%E548D7A9%')


Despite the use of the cache FAST_X strategy, the query time to the 10s or so. Can imagine, full table scan is O (n) complexity, even if we can complete in memory, with the increase in the amount of data, the query time will increase linearly.

Super millions of practical applications is a common number of rows, so need to find a better approach. The so-called better way, is nothing more than ideas in the database can be used to build an index on, this time Oracle Text comes in handy. Oracle 9i Oracle Text is the perfect middle of a text search engine, Oracle 8 times the name seems to be interMedia Text (uncertain), according to Tom's argument, interMedia Text with% ABC% do not support such a query, so they did not use. However, the Oracle Text, complete with features, to achieve integrated simply overkill.

The content on fast_x index for context

create index fast_x_content on
FAST_X(CONTENT)
indextype is CTXSYS.CONTEXT;

Make inquiries with the following statement

SELECT * FROM X
WHERE ROWID IN
(
SELECT ROW_ID FROM FAST_X
WHERE contains (CONTENT,'%E548D7A9%')>0
)

Execution time is about 0.5m, a lot faster. Even better, content indexing complexity is O (ln (n)), even if the data twice as large again, do not add too much time (measured about 0.7s).

For the integrated query, Oracle Text of the introduction of the core of performance problems solved. In addition, there are some issues to resolve:

1) Data source is not just a single table (such as the case of X), may be X1, X2, X3 .... Each source table in a separate building context index is not a good idea, better approach is to put all the table all the fields are auxiliary tables (such as in the example FAST_X) and going. This can reduce the index number, to reduce response time

2) The source table will not be as general as read-only cases, which have a data refresh problem. There are two parts of the data needs to be updated, first, to refresh the data X to FAST_X, the other is updated FAST_X the index. The former can achieve synchronous or asynchronous mode, synchronization is the Trigger, asynchronous you need to use MQ. If the large amount of data, please use the MQ or bar, much better performance; the latter depends on the user's patience for the content delay time, zero-tolerance need to immediately update the index, while In Yonghu to acceptance of 5 or 10 minutes under the updated would be helpful for the system performance index.

To solve the above problem, a comprehensive search using basic to set up better.

These are the integrated search feature in the Oracle database to achieve a process. In recent years, Lucene and other Java-based text search engines become increasingly popular trend now is to give this a spin-off from the database, because APP Sever on the cost and scalability is better than DB Sever, the load transfer from the DB SERVER APP Sever up a good deal. In the APP Sever the use of file system and java processing search, although the structure is more complex point, but the effect was even better than the Oracle Text, even if the search for high demand is entirely possible to build a separate Searching Server, expansion of very well.

Finally mention that after the introduction of Lucene more question, what strategy to use the metadata from the DB update to the APP Server? But that is another topic, not the scope of this paper out.
标签: increment, oracle, primary key, constraint, d2, search box, source table, brief description, matching records, d3, varchar2, spelling, d4, expert one on one oracle, tom kyte, d5
分类: Database
时间: 2010-07-07

相关文章

  1. Eclipse plug-in standards development to achieve Search functionality (change)

    Introduction: Eclipse in the Search function is very cool, such as java \ file search, powerful, and cool on t ...
  2. Sencha Touch framework development using Twitter search application

    Currently, Android and the two main mobile phone iphone application development to attract a large number of d ...
  3. Development of commonly used open source J2EE project

    Read other people's records, feel good, on the collection here, Original Source: http://blog.csdn.net/fenixsha ...
  4. Embrace the iPhone, to embrace the future of software development

    ■ publish time :2009-Jiang Feng-03-11 19:21:17 Chinese reading report IPhone programmers Houses gold of Ethan ...
  5. The development of information systems research in user needs analysis were solution

    The development of information systems research in user needs analysis were solution 【Author】 2002 -10-21 9:23 ...
  6. The use of RUP 4 +1 view approach to software architecture design [transfer] (a)

    To develop a user satisfaction of software is not an easy task, the software architect must fully grasp the wi ...
  7. Discussion: The current operation and maintenance team's role in software development

    In the past few years, with the cloud computing and the development of additional operation and maintenance ph ...
  8. (Change) What is vertical search - a comprehensive understanding of vertical search engines

    Vertical search is a certain industry, professional search engine, is the search engine segment and the extens ...
  9. [Reserved] Boot Loader startup process and development of lessons learned

    [Reproduced] http://blog.21ic.com/user1/5585/archives/2009/56077.html Date of xgywinner 2009-3-4 8:55:00 Windo ...
  10. Building Erlang development environment in Windows,

    1, download Windows operating environment otp_win32_R14B01.exe under Erlang 2, set the path to the implementat ...
  11. Linux view the process and termination process

    1. Ps command is used to view the currently running processes. grep is a search For example: ps-ef | grep java ...
  12. "Software Architecture Design" reading notes

    Recently read "software architecture color design" of this book, harvest story, happened to see this ...
  13. Java Enterprise System architecture selection considerations

    Java Enterprise System architecture selection considerations Source: Network Java is now the flourishing of a ...
  14. Basic theory test

    Software engineering model Learn about the test and had to discuss the software engineering model, because the ...
  15. Network Element Management System (EMS)

    Element Management Systems Network Element Management System (EMS) The definition of Network Element Managemen ...
  16. Lucene in Action (Simplified Chinese)

    A total of 10 part of the first part of the Lucene core 1. Contact Lucene 2. Index 3. To add a search procedur ...
  17. Principle Xiangjie spider Reptile

    First, the basic principles of search engine spiders search engine spiders that the Search Engine Spider, is a ...
  18. The basic principles of Web Spiders

    Transfer: http://playfish.javaeye.com/blog/150124 Keywords: spiders spider web Web spiders or Web Spider, is a ...
  19. shh integration example (change)

    2009-02-18 shh integration example Preferred create web.xml is mainly configured Spring Struts ActionServlet a ...