Software for Database

What are Database Management Systems – Bottom Line

A really professional DBMS has a powerful active database server. This is not just a technical innovation. The idea of an active server fundamentally changes the idea of the role, scope and principles of DBMS use, and in purely practical terms it allows you to choose modern, efficient methods of building global information systems.

Active Server

The idea of an active intelligent database server did not arise by itself – it was a response to real-life challenges. In Sect. Relational Database – Basic Concepts, a general understanding of databases was formulated. However, the thoughtful reader can expand it. Indeed, objects of the real world, in addition to direct, direct connections, have other, more complex cause-and-effect relations with each other; they are dynamic, in constant change. These relations and processes must be somehow reflected in the database, if we mean not a static storage but an information model of a part of the real world. In other words, a database shall, apart from data and direct connections between them, store the knowledge about data and shall adequately reflect the processes taking place in the real world. Consequently, it is necessary to have a means of storing and managing such information.

Actual tasks

The above requirements lead to the solution of the following problems.

First, it is necessary that the database at any time correctly reflects the state of the subject area – the data must be mutually consistent. Suppose, for example, that the Personnel database stores the information about the ordinary employees, the departments in which they work, and their managers. The following rules should be taken into account: each employee should report to a real manager; if a manager resigns, all his employees are transferred to another one and the department is reorganized; each department should be headed by a real manager; if a department is cut, its manager is transferred to the nomination reserve, etc.

Secondly, the database must reflect some of the rules of the subject area, the laws by which it functions (business rules). The factory can normally operate only if there is a sufficient stock of parts of a certain assortment. Consequently, as soon as the number of parts of a certain type becomes less than the minimum allowable, the plant must buy more parts in the right quantity.

Thirdly, it is necessary to constantly control the state of the database, to track all changes, and to react to them adequately. For example, in an automated production control system, sensors monitor the tool temperature; it is periodically transmitted to the database and stored there; as soon as the tool temperature exceeds the maximum permissible value, it is switched off.

Fourth, it is necessary that the occurrence of some situation in the database clearly and promptly affect the execution of the application program. Many programs require prompt notification of all changes in the database. Thus, in Automated Production Control Systems it is necessary to notify the programs immediately about any changes of technological processes parameters, when the latter are stored in the database. The postal service requires prompt notification of the recipient as soon as a new message is received. A broker at the stock exchange must be notified immediately of stock price changes, because delays of a few seconds can lead to heavy losses.

Notification of the occurrence of a certain state of the database and changes in it may be required in any institution to control the passage of documents. If a document to be reviewed and consecutively approved by several managers is stored in the database, each of them, in turn, will be promptly notified of the arrival of the document for his or her signature.

An important DBMS problem is data type control. It was already mentioned in the chapter Relational Database – Basic Concepts that every column of every table contains data of certain types. The data type is defined during table creation. Each column is assigned to one of the standard data types allowed in the DBMS. Thus, it turns out that only data of standard types can be stored in the database: numbers, integers and real numbers, character strings, data of the “date”, “time” and “currency unit” types, i.e. the real DBMS repertoire is limited to those data types. What about non-standard data? After all, real life requires storing and processing data in a much wider range – planar and spatial coordinates, units of various metrics, five-day weeks (a work week in which Monday immediately follows Friday), fractions, not to mention graphical images.

Traditional Approaches

Until recently, knowledge management functions remained outside the capabilities of relational DBMSs or were very limited.

Traditionally, domain knowledge has been incorporated directly into application programs, using the capabilities of procedural programming languages. In the vast majority of cases, this approach still prevails.

Consider, for example, the Warehouse database that stores information about the availability of parts in a factory warehouse. The Warehouse Accounting application provides an accounting of already existing and newly arrived parts. Its functions include viewing the contents of the database, adding information about new parts, replacing discontinued parts with new ones, etc. There are some rules, such as “At any time the quantity of bushings type parts must not be less than 1000” (the situation with bushings in production is always tense). It is easy to see that this rule should be applied only when the number of bushings decreases. We need to check: has it decreased so much that it is less than 1,000? If so, an urgent letter should be sent to the manufacturing plant requesting that the correct number of bushings be shipped. Of course, this should be done unless a letter has already been sent before.

For this rule to apply, the program must periodically, at certain intervals, query the value in the Quantity column of the Part table for all rows that satisfy the condition Part.Name=’Bushing’. If this value becomes less than 1000, the program must send an email to the manufacturer.

What are the disadvantages of this approach?

First, implementing the rules overloads the application program and makes it harder to write and understand. Second, and this is a more significant disadvantage, when the rule itself is changed, the changes must be reflected in the program text. In cases where the rules are changed drastically, the developer has to reconsider the logic of program execution and practically rewrite it from scratch.

It would be convenient to leave only basic algorithms of data management behind the application programs and take the often changing rules which act in the external world out of the programs and write them in some other way. Otherwise, developers will face unpleasant surprises.

Let’s consider an example of an applied banking system. In its creation, the developers in the algorithms of data management took into account the rules of financial activity, which were dictated by the current legislation. Let us now assume that the laws have undergone some changes (which happens all too often nowadays!). Naturally, these changes should be immediately reflected in the data management algorithms, which would necessitate the modification of all the applications that make up the information system.

This in itself is a tremendous amount of work-to fix and debug the programs, compile and assemble them, change the documentation, and retrain the staff.

On the other hand, the rules in question must not contradict each other. When they are implemented by a group of developers, there is no guarantee that the rules are mutually consistent. In fact, the rules should be formulated and controlled by a single person – the database administrator. In traditional approach it is practically impossible to provide centralized control over mutual consistency of rules, if they are scattered among many programs, and, what is more important for commercial organizations, it is practically impossible to control deliberate distortion of rules by programmers.

Thus, inclusion of rules in application programs, where a server is given the passive role of a data provider and custodian, and the entire intellectual part is implemented in the program, is an outdated technology. It is fraught with large overhead costs when changing rules and does not provide centralized control over their consistency. It is easy to see that this technology is based on such a popular nowadays RDA-model (it was described above).

The traditional solution of the tasks of controlling the state of a database and notifying application programs about all the events occurring in it relies on polling mechanisms by application programs of a database, which has the following disadvantages.

The application program cannot poll the database continuously, because it will overload the server with useless queries. Polling is done at intervals determined by the programmer. Consequently, if there are any changes in the database, they are not detected immediately, but after a certain period of time. This is why the traditional solution does not provide immediate notification, while in real-time applications, this is a key requirement. The constant polling of the database greatly affects the system performance – programs polling the database overloads the server and the network with its requests. Cumbersome constructions in the text of programs that implement polling, seriously impede its writing and understanding.

Another important real life requirement is the synchronization of multiple programs accessing the database. Consider an example. The financial system of a factory needs to keep track of the receipt of payments for products to a certain account. As soon as the money arrives (everyone knows how important this event is – “the money has arrived”!), all the application programs included in the financial system must be notified of this event. After that, each of them can take some action. Thus, the program Product shipment has to find the appropriate order in the database, determine the nomenclature of ordered products, its quantity, delivery dates, form and send to the warehouse of finished products order for shipment, print the accompanying documents – that is, to prepare the products for shipment. The Accounting software, in particular, should determine by the date of receipt of money whether the payment is overdue, and if so, accrue a penalty. All of these actions are activated by the event in the database: receipt of money to an account (in terms of the relational DBMS it means adding a new row to the Payments table). The work of all the programs is synchronized by this event.

The traditional solution to the synchronization problem is provided by the standard means of the multitask operating system. However, such synchronization can only be connected with the changes occurring in the database through a constant polling of the Payments table. The disadvantages of this approach are obvious – in this approach, the interrelation of programs is provided at the operating system level, while this function must be performed by the DBMS. In addition, we have to involve polling software, the disadvantages of which we have already discussed.

A common way to overcome the limitation on data types in the DBMS is to convert the data of new types to standard ones. As a rule, new data types are treated as integers or real numbers, or as strings of characters.

Consider an example. In a number of countries, including the United States, feet and inches are used in addition to the metric system to measure lengths. The rules for arithmetic operations in this system are different from the decimal system. Thus, three feet eleven inches plus one inch equals four feet ( 3’11’ + 1′ = 4′). The standard set of data types does not allow you to define and operate on data in this system. They must be converted to floating point numbers, that is, they must be represented as 3.91666 (three feet and eleven inches) and 0.08333 (one inch), respectively. By performing the addition operation (3.91666 + 0.08333 = 3.99999), we see that such a representation leads to loss of precision (because the result should be equal to four!).

Consequently, a direct conversion of new data types to standard types is fraught with errors – it is necessary to convert them into data of standard types. Functions of data conversion must be undertaken by application programs (there is no one else). The result is a very cumbersome scheme. The program retrieves data of new types from the database, represented as standard types, converts and processes them, then converts and transmits them to the server for storage. The server does not take part in processing data of new types under this scheme, because it treats them as standard and will process them as standard (and then errors will occur!).

Today, for example, the problem of big numbers has become urgent for domestic banks. The scale of calculations has grown so much that some final operations contain sums that exceed the digit grid for integers. Therefore, programs have to convert integers into strings and vice versa, which by no means simplifies their logic. Storing and handling large integers as real numbers leads to loss of accuracy, which is unacceptable in a financial system.

As we will see below, the solution to this problem lies in defining a new data type, “large integers”. Once this is done, the database server begins to “understand” the new data type and perform all operations on it that are specific to integers.

Another major requirement for modern DBMS is the ability to store large unstructured objects (Binary Large OBjects – BLOBs). Responding to this requirement, the DBMS developers provide for such a possibility. However, at the same time, the server only stores such objects and does not have the ability to process them. For example, working with graphical objects, the server does not distinguish between the image of a BMW car and the structure of DNA. The server has to pass them for interpretation to an application program that can figure out what is what.

The integrity problem should not be forgotten either. If one program interprets some type of data, converting it into its own format, then nothing prevents another program from doing the same – with the only difference that the format of representation of the same data will already be different. This makes it fundamentally impossible to control the integrity of the data, if only because different programs interpret them differently.

Note that the absence of non-standard data types in the DBMS greatly affects their performance, since normal interaction between the client and the server is possible only when both of them adequately perceive the types of data being worked with. In fact, in this scheme, the appearance of non-standard data types leads to degradation of the system to the architecture with a file server. Without understanding the new data type, the server can only obediently save it without knowing how to compare, sort, perform any operations on the data.

On the other hand, this limitation leads to the fact that the main burden of handling non-standard data types falls on the application programs. At the same time, the issue of data integrity remains open, since no centralized control of data types, which is undoubtedly a function of the database server, is supported.

So, in the traditional technology, the solution to the problems discussed above falls entirely on the applications. The disadvantages of traditional technology are a consequence of the fact that in the “client-server” interaction model the latter is given a mostly passive role. Firstly, the database server has no functions of storing and processing knowledge about a subject area. Secondly, it is beyond the limits of the server’s capabilities to control the state of the database and program responses to its changes. Third, the passive server has no means of tracking events in the database, as well as the means to influence the work of application programs and the ability to synchronize them.

Modern Solutions

The ideas implemented in the third generation DBMS (not yet, unfortunately, in all DBMSs) are that knowledge is taken out of the scope of application programs and formalized as database objects. The knowledge application functions are performed directly by the database server.

This architecture is the embodiment of the concept of an active server. It relies on four “pillars”:

  1. Database procedures;
  2. Rules (triggers);
  3. Database events;
  4. User-defined data types.

Database procedures

In different DBMS they are called stored, attached, shared, etc. Below we will use the terminology adopted in the Ingres database.

The use of database procedures has four purposes.

First, it provides a new independent level of centralized data access control by the database administrator.

Secondly, one procedure can be used by several application programs – this allows to significantly reduce the time of writing programs by formalizing their common parts in the form of database procedures. The procedure is compiled and placed in the database, becoming available for multiple calls. Since its execution plan is defined once during compilation, optimization phase is skipped during next calls of the procedure that significantly saves computational resources of the system.

Thirdly, the use of database procedures can significantly reduce network traffic in client-server systems. The application program that calls a procedure sends only its name and parameters to the server. The procedure usually concentrates repetitive fragments from several application programs (Figure 13). If these fragments remained part of the program, they would load the network by sending full SQL queries.

Finally, fourth, database procedures, combined with the rules discussed below, provide the administrator with a powerful means of maintaining database integrity.

In modern DBMSs, a procedure is stored directly in the database and is controlled by its administrator. It has parameters and returns a value. A database procedure is created by the CREATE PROCEDURE operator and contains variable definitions, SQL statements (for example, SELECT, INSERT), condition checking statements (IF/THEN/ ELSE), loop statements (FOR, WHILE), and some others.

Suppose, for example, that you want to develop a procedure that moves an ordinary employee to the managerial candidate pool. The Assignment procedure moves rows from the Employee table, which contains information about employees, to the Reserve table for all employees with the specified number. The employee number is an integer number (type integer) that cannot have an empty value, is a parameter of the procedure and is passed to it when called from an application program using the EXECUTE PROCEDURE operator.

Rules

The rules (triggers) mechanism allows you to program the handling of situations arising from any changes in the database.

A rule is assigned to a database table and is used when operations of including, deleting or updating rows are performed over the table.

The application of a rule is to check the conditions specified in it and, if they are met, call the database procedure specified within the rule. Importantly, a rule can be applied both before and after an update; therefore, an operation can be canceled.

Thus, a rule allows you to define the server’s reaction to any change in the state of the database. Rules (as well as procedures) are stored directly in the database independently of the application programs.

One of the purposes of the rules engine is to reflect some external rules of the organization. Suppose, for example, the Warehouse database contains a Part table that stores information about the availability of parts in the factory warehouse. One of the rules of plant activity is that it is unacceptable when the number of parts of any type in the warehouse becomes less than some number (e.g., 1000).

This real life requirement is described by the Check_part rule. This rule is used when you update the Count column of the Parts table: if the new value in the column is less than 1000, then the procedure Order_parts is executed. The procedure accepts as parameters the part number of the given type and its residue (the number of parts in stock).

Thus, if a situation occurs where the number of parts of a certain type in stock is less than the required number, a database procedure is launched that orders the missing number of parts of this type. The ordering is reduced to sending a letter (e.g., by e-mail), to the factory or workshop that makes the parts in question. All this happens automatically, without user intervention. This example is, of course, simplistic – it does not take into account that the order could have already been placed before.

The most important purpose of the rules engine is to ensure the integrity of the database. One aspect of integrity, referential integrity, refers to the relationship between two tables. Recall that this relationship is maintained by foreign keys.

For example, assume that the Supervisor table contains information about supervisors and the Employee table contains information about employees of an organization (see the example in Section Relational Database – Basic Concepts). The Supervisor Number column is an external key of the Employee table and is a reference of this table to the Supervisor table.

To ensure the integrity of the links, two requirements must be considered. First, if a new row is added to the Employee table, the value of the Supervisor Number column must be taken from the set of values in the Number column of the Supervisor table (an employee can only be subordinate to a real supervisor). Second, if you delete any row from the Supervisor table, there must not be a single row left in the Employee table that has a value identical to the Number column in the row being deleted (all employees, if their supervisor has quit, must move to another supervisor).

How to take these requirements into account in practice? Obviously, rules should be created to implement them. The first rule, Add_employee, is triggered when a row is included in the Employee table; its application consists in calling the Check_Manager procedure, which checks whether there is a value identical to the value of the Number_Manager field of the row to be added among the set of values in the Number column of the Manager table. If it does not, the procedure should reject it. The second rule applies when attempting to delete a row from the Supervisor table; it consists of calling a procedure that compares the values in the Number_Manager column of the Employee table with the value of the Number field in the row to be deleted. If there is a match, the value in the Number_Manager column is updated.

The rules mechanism also allows you to implement more general integrity constraints. Suppose, for example, that the Employee table contains information about employees, including the name and name of the department in which they work. The Department table stores for each department the number of employees working in it in the Number of Employees column. One integrity constraint is that this number must match the number of rows for that department in the Employee table. How do you account for this constraint? One possible solution is to use the Add_employee rule, which is applied when you include a row in the Employee table and runs the New_employee procedure. It, in turn, updates the value of the Number of_employees column by increasing it by one. The parameter of the procedure is the name of the department.

Of course, in practice, the rules engine is used to implement more complex and sophisticated integrity constraints.

The rules engine is the heart of an active database server. Rules are analogous to triggers, which first appeared in the Sybase database (as far as the author knows) and were later implemented in one form or another and under one name or another in most multi-user database management systems.

Events in the database

Database events allow application programs and the database server to notify other programs about certain database events and thus synchronize their work. SQL language operators that provide notification are often called database event alerts. The event management functions are the responsibility of the database server.

The event mechanism is used as follows. First, a checkbox is created in the database for each event, the state of which will notify the application programs that some event has occurred (CREATE DBEVENT operator). Next, any application programs whose progress may be affected by the event include the REGISTER DBEVENT operator, which notifies the database server that the program is interested in receiving an event message. Any database application program or procedure can now call the event with the RAISE DBEVENT statement. Once the event has occurred, every registered program can retrieve it by requesting the next message from the event queue (GET DBEVENT statement) and requesting information about the event, particularly its name (SQL INQUIRE_SQL statement).

The following example illustrates the processing of all events from the queue.

Let’s look at an example from a production system that illustrates the use of an event mechanism in a database together with rules and procedures. Events are used to determine when a work tool becomes too hot and must be turned off.

A rule is created that is applied whenever a new tool temperature value is entered in the Tool table. As soon as it exceeds 500 degrees, the rule calls the Disconnect_Tool procedure.

Finally, a Tool Monitor application program is created that monitors the status of the tools. It is registered by the server as the recipient of the Overheat event using the REGISTER DBEVENT statement. If the event occurs, the program sends a message to the user and the signal necessary to shut down the tool.

  • The Tool Monitor application periodically records the current parameter values of many different tools using sensors;
  • The same program enters a new temperature value for that tool into the Tool table;
  • Whenever this happens, i.e. the value in the Temperature column of the Tool table is updated, the Overheat_instrument rule is applied;
  • The application of the rule consists in checking the new temperature value. If it exceeds the maximum permissible value, the Disable_instrument procedure is run;
  • It changes the value in the Status column of the Tool table to ‘OFF’;
  • It also calls the Overheat event;
  • Event Monitor receives (intercepts) the Overheat event;
  • It also sends a message to the dispatcher;
  • It also shuts down the tool.

If traditional database polling methods were used, the logic would be completely different. You would have to develop an additional program that would periodically perform a sampling operation from the Tool table based on the “Temperature > 500” criterion. This would have a huge impact on efficiency, since the SELECT operation would incur serious overhead.

Of course, this example is only for illustration of the rule-procedure-event mechanism and does not reflect in any way the real technological processes control in manufacturing.

User-Defined Data Types

The problems with the data types described above are solved by integrating new data types into the server. Unfortunately, not all modern DBMSs support user-defined data types. So far, only the Ingres DBMS includes such a mechanism. This system allows the programmer to define their own data types and operations over them and use them in SQL statements. To define a new data type you need to write and compile the functions in C, and then build the link editor some modules Ingres. Note that the introduction of new data types is, in fact, changing the core of the database. It is also important that in Ingres data types defined by the user, can be parameterized.

Defining a new data type comes down to specifying its name, size, and identifier in a global structure describing data types. To be able to use functions with the new data type that implement standard operations (comparison, conversion to different formats, etc.), the programmer must develop them himself (the interface of functions is predefined). Pointers to these functions are elements of the global structure. Once a new data type is defined, all operations are performed on it as on the standard data type. Allowing the user to create his own data types is essentially one of the steps of relational DBMS development towards object-relational systems.

Distributed data processing

One of the main features of modern information systems is their distributed nature. Their scale is increasing, they cover more and more points all over the world. The current level of decision-making, operational management of information resources requires more and more decentralization. Information systems are in constant development – new segments are added to them and the range of functions of the existing ones is expanding. An example of a distributed system is the ticket reservation system of a large airline company, which has branches in different parts of the world.

The main problem of such systems is the organization of distributed data processing. The data are located on computers of different models and manufacturers, operating under different operating systems, and the data are accessed by heterogeneous software. Computers themselves are geographically distant from each other and are located in different geographical locations around the world.

Two technologies have responded to the challenges of real life: Distributed Database technology and Data Replication technology.

Distributed database means a database including fragments of several databases located on different nodes of a network of computers, and, possibly, managed by different DBMS. A distributed database looks like an ordinary local database from the point of view of users and applications. In this sense, the word “distributed” reflects the way the database is organized, but not its external characteristic (“distributed” database should not be visible from the outside).

In contrast to distributed databases, data replication implies rejection of their physical distribution and relies on the idea of data duplication in different nodes of the computer network. The details, advantages and disadvantages of each technology will be described below.

However, before turning to the problems of distributed data processing, it is necessary to deal with some aspects of networking (access to remote data).

Using database management systems in Canadian online casinos

Canadian online casinos make use of database management systems in order to keep track of their large amounts of data. These databases store information on player habits, game outcomes, and more. Having this data readily available allows casino staff to make informed decisions about game design, marketing, and other areas. Database management systems also help to ensure that all of the data is accurate and up-to-date. This is especially important in the fast-paced world of online gambling, where new games and promotions are constantly being introduced. By using database management systems, Online Casinos CA are able to provide a high level of service to their players.

Aspects of Networking

In Sect. Database Server, four models of client-server technology were discussed. The traditional and most popular is the remote data access model (RDA model). Consider it again, but in more detail. So, there is a computer that runs the front end programs (which are implemented as interface functions with the user, and application functions) – the client (usually called the local node), connected to the network with the computer on which the database server is running and the database itself (usually called the remote node). All problems arising from client-server interaction must be solved by a special DBMS component called Communication Server (DBMS Server Net). To support the client-server interaction it must run on the remote node, while at the same time the communication program interacting with the communication server (DBMS Client Net) must be executed on the local node.

The interaction between application programs – clients and the database server is based on a number of fundamental principles that determine the functionality of modern DBMS in terms of network interaction and distributed data processing, among which are:

  • Location Transparency;
  • Network transparency;
  • Automatic conversion of data formats;
  • Automatic code translation;
  • Interoperability;
  • Location Transparency.

Transparent (for the user) access to remote data implies using in application programs such an interface with the database server that allows transferring data in the network from one node to another without requiring modification of the program text. In other words, access to the information resources must be completely transparent regarding the data location.

Any user or any application program operates with one or more databases. When the application program and the database server are executed on the same node, there is no location problem. To access the database, the user or program only needs to specify the database name, for example: SQL Dbname.

However, in the case where the application program is run on the local node and the database is located on the remote node, there is a problem identifying the remote node. In order to access the database on the remote node, you must specify the remote node name and the database name. If you use a rigidly fixed node name in the “node_name, database_name” pair, the application program becomes dependent on the database location. For example, accessing the database “host::stock”, where the first component is the node name, will be location dependent.

One possible solution to this problem is to use virtual node names. They are managed by a special DBMS software component, the Name Server, which addresses client queries to the servers.

Installing DBMS Client Net components on local nodes performs the procedure of identification nodes, when the real name of the remote node is assigned a virtual name, which is then used when accessing the database. If the database is migrated to another node, you do not need to make any changes to the application program – you just need to match the virtual name to the name of the new node.

Network Transparency

Client and server communicate over a network with a specific topology; a specific protocol is always used to support the communication. Consequently, it must be organized in such a way as to provide independence from both the network hardware used and the network communication protocols. To provide transparent access to remote data users and programs in a network of heterogeneous computers, communication server must support the widest possible range of network protocols (TCP / IP, DECnet, SNA, SPX / IPX, NetBIOS, AppleTalk, etc.).

Automatic conversion of data formats

As soon as several computers of different models, running different operating systems, connect to a network, the question immediately arises about the reconciliation of data presentation formats. Indeed, the network may contain computers of different sizes (16-, 32- and 64-bit processors), the order of the bytes in the word, the floating-point presentation, etc. The job of the communication server is to provide format coordination between the remote and local nodes on the data exchange level, so that data retrieved by the server from the database on the remote node and transmitted over the network will be correctly interpreted by the application program on the local node.

Automatic translation of codes

In a heterogeneous computer environment in the interaction between client and server, there is also the task of translation of codes. The server can work with one code table (eg, EBCDIC), the client with another (eg, ASCII), in this case there is a mismatch of interpretation of the character codes. Therefore, if the local host uses one code table, and at a remote site – another, then the transfer of requests over the network and when receiving responses to them must ensure translation of the codes. The solution to this problem also falls on the communication server.

We have considered the details of the interaction of a single “client-server” pair. However, in real life, the database server must serve many requests from clients simultaneously – hence, at one point in time, there may be several such pairs. And all the interaction problems discussed above, should be solved by the communication server for all these interacting pairs.

Systems with a one-to-one architecture (see Sect. Database Server) to serve multiple clients simultaneously by the database server are forced to load a separate communication server for each client-server pair. As a result, the load on the operating system increases, dramatically increasing the total number of its processes that consume computing resources. This is one of the disadvantages of the one-to-one architecture.

This is why it is important for modern distributed DBMS to have a multithreaded communication server, which supports multiple clients simultaneously accessing the server. On each network node, it supports multiple pairs of client-server connections and allows multiple independent database sessions to exist simultaneously.

Distributed Databases

The essence of a distributed database is expressed by the formula: “Access to a distributed database looks to the client exactly the same as access to a centralized database.”

Let’s take an example. Let the Warehouse database be located on Node_1, and the Enterprise database – on Node_2. The first one contains the Part table, and the second one contains the Supplier table. Let the Nomenclature database must include the tables Part and Supplier, physically belonging to different databases. Logically it will be treated as an unallocated database. Below are the SQL statements that establish references to local database tables. They are included in distributed database description and passed to the distributed database server discussed below.

Thus, the description of the distributed database Nomenclature explicitly sets references to specific tables of actually existing databases; however, when working with such a database, this physical distribution of data is transparent to the user.

So far, we have talked about the problems of network interaction between the client and the server. Their solution by means of a communication server is a necessary (but not sufficient) condition for the support of distributed databases. The following problems remain unsolved so far:

  • Name management in a distributed environment;
  • Optimization of distributed queries;
  • Management of distributed transactions.

The first one is solved by using a global data dictionary. It stores information about the distributed database: data location, capabilities of other DBMS (if a gateway is used), information about the network transmission speed with different topology, etc.

Global Data Dictionary is a distributed database object location tracking mechanism. The data may be stored on the local node, on the remote node, or on both nodes, and their location must remain transparent for both the end user and the programs. There is no need to explicitly specify the location of the data – the program must be completely independent of which nodes host the data that it operates on.

As for the second problem, it requires an intelligent solution. A distributed query affects several databases on different nodes, and the sampling volumes may be quite different. Let’s turn to the Nomenclature database, which is distributed across two network nodes. The Part table is stored on one node, the Supplier table on the other node. The size of the first table is 10000 rows, the size of the second is 100 rows (many parts are supplied by a small number of suppliers).

The query result is a table containing the Part_name column from the Part table and the Supplier_name and Supplier_address columns from the Supplier table. That is, the resulting table is a join of two tables. It is based on the Supplier_number column in the Detail table (foreign key) and the Supplier_number column in the Supplier table (primary key).

This query is distributed, because it affects tables belonging to different local databases. For its normal execution, it is necessary to have both source tables on the same node. Consequently, one of the tables must be transferred over the network. Obviously, it must be a smaller table, i.e. the Supplier table. Therefore, the distributed query optimizer must necessarily consider the table sizes. Otherwise the query will take unpredictably long to be executed.

Apart from the table size, the distributed query optimizer must take into account many additional parameters, including the data distribution statistics among the nodes, amount of data transferred between the nodes, speed of communication lines, data storage structures, correlation of processor performance on different nodes, etc. All these data are exactly contained in the global data dictionary.

As far as distributed transaction processing is concerned, this topic will be discussed in the next Section Transaction processing.

The solution of all three tasks described above is entrusted to a special DBMS component – Distributed Database Server.

If the database is located on the same node, and the database server and application program are executed there, then neither a communication server nor a distributed database server is required. When the application program is running on the local node, the database is on the remote node and the database server is running there, the remote node needs a communication server and the local node needs a service communication program.

If local databases are located on several nodes, both a distributed database server and a communication server are needed to access the distributed database.

The most important requirement to modern DBMS is interoperability (or interoperability). This quality can be interpreted as the openness of the system, which allows embedding it as a component in a complex heterogeneous distributed environment. Interoperability is achieved both through the use of interfaces corresponding to international, national and industrial standards and through special solutions.

For DBMS this quality means the following:

  • The ability of applications created by the development tools of a given DBMS to operate on databases in an “alien” format as if they were their own databases;
  • The property of a DBMS that allows it to serve as a data provider for any applications created by third-party development tools that support a certain database access standard;
  • The first is achieved by using gateways, the second by using the ODBC interface, which will be discussed in Section Interaction with PC-oriented databases.

So far we have considered homogeneous databases, that is, databases in a particular DBMS format. At the same time, the DBMS can access the database in a different format. This is done with the help of a gateway. For example, the Ingres DBMS accesses the database in Rdb format through a special gateway. If an Alpha DBMS accesses a database in Beta format (or just a Beta database), Alpha is said to have a gateway to Beta.

Modern information systems require access to heterogeneous databases. This means that the application program must use such means to implement queries to the databases so that the queries are understandable to different DBMSs, both relational and those relying on other data models. One possible way is a generalized set of different SQL language dialects (as is done, for example, in the OpenIngres DBMS).

The global database is distributed over three nodes. Node A is a VAX 6000/560 computer with VMS and Rdb DBMS, where the local Enterprise database in Rdb format is located. The second node (B) is a SUN Sparc Server 1000 computer running the Solaris operating system. It runs the Ingres DBMS and hosts the local Warehouse database in Ingres format. IBM mainframe with MVS operating system and DB2 DBMS acts as node C. It hosts a local DB2-formatted Tool database.

The distributed database server – a component of the Ingres DBMS – runs on node B. Ingres communication servers run on all three nodes. Nodes A and B use TCP/IP for communication, while nodes B and C communicate in accordance with the SNA standard.

The local databases on all three nodes are managed completely autonomously. Distributed Production database contains tables from all three local databases. To access the distributed database server to the Enterprise database requires a gateway from Ingres to Rdb, and to access the Instrument database – a gateway from Ingres to DB2.

The gateway from Ingres to DB2 allows you to manipulate data in DB2 format as if it were data in Ingres format. The gateway from Ingres to Rdb allows you to manipulate data in Rdb format as if it were data in Ingres format.

All these details are not visible to the end user. It works with the Production database as if it were a centralized Ingres database. This is what fully transparent data access is all about.

Note that distributed database technology protects software investments. It can be regarded as a bridge from mainframe systems and non-relational DBMSs to modern professional DBMSs based on RISC computers. It allows you to develop applications for them, giving them access to vast amounts of information on large computers, and thus ensures a smooth and painless transition to the new platform.

Data replication technology

The fundamental difference between data replication technology and distributed database technology (often called STAR technology for short) is the rejection of distributed data. Its essence is that any database (both for the DBMS and for the users working with it) is always local; the data are always located locally in the node of the network where they are processed; all transactions in the system are terminated locally.

Data replication is an asynchronous transfer of the source database objects changes to the databases belonging to different nodes of the distributed system. The data replication functions are executed by a special DBMS module – the data replication server called a replicator. Its task is supporting the identity of data in the target databases to the data in the initial database. The signal for launching the replicator is the triggering of a rule (see the Database Server section) intercepting any alterations of the replicated database object. The software control of the replicator by means of signalers about the events in the database is also possible.

As the basis for replication the transaction to the database acts. At the same time the transfer of changes by groups of transactions, periodically or at some point in time is possible, which makes it possible to investigate the state of the host database at a certain point in time.

The details of data replication are completely hidden from the application program; its functioning does not depend on the work of the replicator, which is entirely under the control of the database administrator. Consequently, it is not necessary to modify the program to transfer it to the distributed environment with replicated data.

Distributed database technology and data replication technology are antipodes in a certain sense. The cornerstone of the former is synchronous completion of transactions simultaneously on several nodes of the distributed system, i.e. synchronous fixation of changes in the distributed database. “STAR technology’s Achilles’ heel is the stringent requirements for performance and reliability of communication channels. If the database is distributed over several geographically remote nodes united by slow and unreliable communication channels, and the number of simultaneously working users is dozens or more, the probability of a distributed transaction being committed in a foreseeable time interval becomes very low. Under such conditions (characteristic, by the way, of most domestic organizations) distributed data processing is practically impossible.

A real alternative to STAR technology becomes the technology of data replication, which does not require synchronous fixation of changes (and this is its strong point). In fact, not all tasks require ensuring the identity of the database on different nodes at any time. It is sufficient to maintain data identity only at certain critical points in time. Therefore, you can accumulate changes in the data in the form of transactions in one node and periodically copy these changes to other nodes.

Let’s summarize the obvious advantages of data replication technology. First, the data are always located where they are processed – hence, the speed of access to them significantly increases. Second, the transfer of only operations that modify data (and not all remote data access operations, as in STAR technology), and moreover, in asynchronous mode allows to significantly reduce traffic. Thirdly, on the source side for the receiving database, the replicator acts as a process initiated by a single user, whereas in a physically distributed environment all users of the distributed system work with each local server, competing for resources with each other. Finally, fourth, no prolonged communication failure is able to disrupt the transfer of changes. The fact is that replication implies buffering the stream of changes (transactions); after communication is restored, transmission resumes from the transaction in which replication was interrupted.

The technology of data replication is not without some disadvantages, resulting from its specifics. For example, it is impossible to completely eliminate conflicts between two versions of the same record. They may arise when, due to the same asynchrony, two users on different nodes will correct the same record at the moment when the data changes from the first database have not yet been transferred to the second. Consequently, when designing a distributed environment using data replication technology, it is necessary to foresee the conflict situations and to program the replicator for some variant of their resolution.

Concluding the discussion it should be noted that data replication is not another technological innovation and not a whim of the DBMS developers. The viability of the replication concept is confirmed by the experience of its use in the area with higher requirements to reliability – in the sphere of banking information systems.

Interaction with PC-based DBMSs

Originally professional DBMS were created for powerful high-performance platforms – IBM, DEC, Hewlett-Packard, Sun. But then, given the growing popularity and widespread use of personal computers, developers began to transfer (porting) DBMSs into the operating environment of desktop computers (OS/2, NetWare, UnixWare, SCO UNIX).

At present, the majority of DBMS vendors are developing three areas of their systems. Firstly, DBMS improvement for corporate information systems, characterized by a large number of users (100 and more), large size databases (often called Very Large Data Base – VLDB), mixed data processing (solving operational transaction processing and decision making support tasks), etc. This is a traditional area of mainframe-systems and RISC-computers approximating them in performance.

Another, no less important direction is DBMSs supporting so called workgroups. This direction is characterized by a relatively small number of users (chamber character of DBMS application), while retaining, however, all the “multi-user” qualities. The systems of this class are mainly oriented towards “office” applications that do not require special features. Thus, the majority of modern multi-user DBMSs have system versions, functioning in the Novell NetWare network operating system. Here the DBMS kernel is arranged as a NetWare Loadable Module (NLM), running on the file server. The database is also located on the file server. SQL queries come to the database kernel from the applications that run on the network stations – personal computers (note that in spite of the use of file server, here we deal with RDA-model).

Finally, a new impetus in the development was given to desktop versions of DBMS, aimed at personal use, primarily in the MS Windows operating system (systems of this class are informally called “light”).

The desire of DBMS vendors to have actually three versions of their systems, covering the entire range of possible applications, looks extremely attractive to users. Indeed, it is extremely convenient for a specialist to have a local database on his portable computer (constantly used during business trips) in the same format and processed by the same rules as the firm’s stationary corporate database, where the collected data can be easily delivered.

In recent years (1987-94), our country has developed many programs focused on the use of DBMS like PARADOX, FoxPRO, dBASE IV, Clipper. When migrating to a more powerful multi-user DBMS, users have a natural desire to integrate existing developments into this environment. For example, there may be a need to store local data on a personal computer and access it using FoxPRO, and simultaneously have access to a global database managed by Oracle. Organizing such access, where a program can simultaneously work with both a personal and a multi-user DBMS, presents a challenge for the following reason.

As you know, the developers of PC-oriented DBMS initially used their own interface to the databases, without taking into account the requirements of the SQL language standard. Only later they gradually began to include in their systems ability to work with the database using SQL. At the same time for truly multi-user DBMS SQL interface – the de facto standard. At the same time, the task of harmonizing the interfaces of different classes of DBMS arose. It can be solved in several ways, but most of them are private. Let us consider the most common solution to this problem.

Experts of firm Microsoft have developed standard Open Database Connectivity (ODBC). It is a standard Application Programming Interface (API) and allows programs that work in the Microsoft Windows environment to interact (via SQL statements) with different DBMS, both personal and multi-user, operating in different operating systems. In fact, the ODBC interface universally separates the purely application side of the applications (spreadsheet processing, statistical analysis, business graphics) from the actual processing and data exchange with the DBMS. The main purpose of ODBC is to make the interaction between the application and DBMS transparent, independent of the class and features of the used DBMS (mobile in terms of the used DBMS).

Let us note that the ODBC standard is an integral part of the family of standards facilitating the writing and ensuring the vertical openness of applications (WOSA – Windows Open Services Architecture).

The ODBC interface (Figure 17) provides mutual compatibility between the server and client components of data access. The ODBC driver concept (which is a dynamically loaded library) was introduced to implement the unified access to different DBMS.

The ODBC architecture contains four components:

  1. Application;
  2. Driver manager;
  3. Drivers;
  4. Data sources.

The roles among them are distributed as follows. The application calls ODBC functions to execute SQL instructions, receives and interprets the results; the driver manager loads ODBC drivers when the application requires it; ODBC drivers handle ODBC function calls, pass SQL statements to the DBMS and return the results to the application; the data source is an object that hides the DBMS, network interface details, database location and full name, etc.

The actions performed by an application using the ODBC interface are as follows. To start a database session, the application must connect to the data source that hides it. The application then contacts the database by sending SQL instructions, requests the results, monitors and responds to errors, and so on, i.e., the standard pattern of interaction between the application and the database server, typical of the RDA model, is in place. Importantly, the ODBC standard includes transaction management functions (start, commit and rollback of a transaction). After completing the session, the application must disconnect from the data source.

Conclusion of a review article on the use of database management systems in online casinos

In conclusion, while there are potential benefits to using a database management system in an online casino, there are also many considerations that must be taken into account before making such a decision. Our team of experts have looked at the different options available and compiled reviews on the best ones. Be sure to read our latest findings on the website so you can make an informed decision about which system is right for your business.

Leave a Reply