Understanding Common Techniques for Data Query Optimization and Testing

By and

Date: Jun 26, 2023

Return to the article

In this sample chapter from CompTIA Data+ DA0-001 Exam Cram discover the secrets to maximizing the efficiency of your data queries. Dive into the world of query optimization, execution plans, parametrization, indexing, temporary tables, and subsets of records.

This chapter covers query optimization and testing. It provides details related to query optimization, parametrization, indexing, temporary tables in a query set, subsets of records, and execution plans.

Query Optimization

A computer has computing resources such as a central processing unit (CPU), memory, disk space (storage), an arithmetic logical unit (ALU), optionally a graphics processing unit (GPU), and many more hardware components. When SQL queries are executed against databases, they take up CPU, memory, and other resources. Depending on the complexity of queries, some might take much longer than others to yield results. Performance of queries can be improved in multiple ways—two basic ones being improving or optimizing hardware performance and/or optimizing queries. This chapter is dedicated to query optimization.

Query optimization is an activity that is conducted by a query optimizer in a DBMS to select the best available strategy for executing the query. In other words, query optimization provides a way to get maximum performance for results generated by queries in the minimum amount of time.

A query optimizer can leverage a number of methods for optimizing queries. One of the ways a query optimizer works is by leveraging the cost of executing a query.

So, how do you really go about optimizing queries? Well, there are a number of ways; however, they all involve database performance tuning, which is usually the full-time job of database administrators (DBAs). Consider that large databases might have a lot of data that requires tuning; this tuning leads to optimization of queries and much faster results than can be obtained compared to an untuned database. One of the key things to understand is the execution plan. A query execution plan or an execution plan is a logical outlay of the resources allocated as well as the operations performed when a query is being executed. The execution plan outlines the logical steps executed in a query. (You will learn more about query execution plans in the next section.)

At a high level, you need to know the following when considering query optimization and database tuning:

These tweaks can help queries perform better and yield faster outcomes, which is what the users want.

Execution Plans

Everything in life needs a plan. It is often said that “failing to plan is planning to fail.” An execution plan works behind the scenes to ensure that a query gets all the right resources and is being executed correctly; the plan outlines the steps for the query execution from start until output.

This section shows examples using MySQL Workbench for database creation and query execution. The database used is Azure Database for MySQL Flexible Server.

In the case of MySQL, the EXPLAIN statement gives insights into how MySQL executes statements; that is, the EXPLAIN keyword outlines the query execution plan. It works with the following SQL statements:

Figure 8.1 shows a query run in MySQL Workbench against the sample database available at https://github.com/Azure-Samples/mysql-database-samples/blob/main/mysqltutorial.org/mysql-classicmodesl.sql.

Figure 8.1 MySQL SELECT Statement Output

Now you can select Query > Explain Current Statement, as shown in Figure 8.2.

Figure 8.2 MySQL EXPLAIN Statement Output

MySQL Workbench gives you the query execution plan, including the query cost, as shown in Figure 8.3.

Figure 8.3 Query Execution Plan Visual Output

Figure 8.4 shows the details of the query execution plan.

Figure 8.4 Query Execution Plan Details

As you can see, an execution plan provides details that can be very helpful for optimizing queries.

Parametrization

A parametrized query enables you to use placeholders for parameters, where the parameter values are supplied at execution time. Why have parametrized queries at all? Well, there are a few benefits to using these queries:

Next, let’s look at an example of a parametrized query, using the database available at https://github.com/Azure-Samples/mysql-database-samples/blob/main/mysqltutorial.org/mysql-classicmodesl.sql.

This is the original (non-parametrized) query:

SELECT * FROM customers

Figure 8.5 shows the output of this query.

Figure 8.5 Output of a Non-parametrized Query

Now you can use the following parametrized query:

PREPARE stmt FROM 'SELECT * from customers WHERE customerName = ?';
SET @A = 'La Rochelle Gifts';

EXECUTE stmt using @A;

Figure 8.6 shows the output of this query.

Figure 8.6 Output of a Parametrized Query

What is happening in this scenario? A value is being passed to the query, and the query uses that value to process the output. As shown in Figure 8.5, every column and every row in the table are returned. In comparison, as shown in Figure 8.6, for the row where the value matches the value sent in @A, every column is shown in the query output.

Indexing

When you want to find information about a certain topic in a book, what’s the easiest way to find it? You could just flip the pages, but it’s usually much faster to use the table of contents or the index to pinpoint the page that contains the information you want. Similarly, with databases and queries, to get optimized output when you execute a query, indexes help locate the information quickly, thus improving the user experience.

So, how do indexes work? Well, quite simply, indexes help speed up the execution of queries by rapidly finding records and delivering all the columns requested by a query without executing a full table scan. It is helpful to prepare a comprehensive indexing plan that outlines the indexes required for each table, presents the sequence in which the indexes should be created, and outlines the columns to be selected for indexing. Figure 8.7 shows the indexes for the sample database.

Figure 8.7 Indexes in a MySQL Sample Database

In this case, you can see that the index type is BTREE, which stands for B-tree (or balanced tree).

As illustrated in Figure 8.8, a B-tree is formed of nodes, with the tree starting at a root, which has no parent node. Every other node in the tree has one parent node and may or may not have child nodes. A node that does not have any child nodes is called a leaf node.

Figure 8.8 B-tree Structure

Indexes can be classified into two major types:

So you know that creating indexes helps improve the efficiency of read operations by reducing the amount of data that a query needs to process. Are indexes used automatically, or do you need to define an index when running a query? Well, when you execute a query, the query optimizer generates an execution plan—and this execution plan should contain the index or indexes.

So, when indexes are created, what are some of the things to look for? Generally, you need to consider the following aspects:

Figure 8.9 Unused Indexes in MySQL Workbench

You can explicitly create indexes when it makes sense to do so. In MySQL, you can create an index as follows:

CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name USING [BTREE | HASH
| RTREE] ON table_name (column_name [(length)] [ASC | DESC])

In this case, you execute a query like this:

CREATE UNIQUE INDEX cu_number USING BTREE ON customers(customerNumber)

Figure 8.10 shows the outcome of this query.

Figure 8.10 Creating a New Unique Index

As you can see, the other index created— cust_number—is not unique and therefore has the Non_unique option set to 1. The statement used for that index is:

CREATE INDEX cust_number ON customers(customerNumber)

You can also create an index in Microsoft SQL by using the following statement:

CREATE UNIQUE INDEX index_name ON table_name (column_name1,
column_name2, ...);

Temporary Table in a Query Set

At times, you may need to create temporary space for items being moved from one place to another or when sorting out stuff in a database. In SQL, temporary tables are very similar to placeholders and offer workspaces for the transitional results you get when processing data.

To see how temporary tables work, let’s consider an example. In this example, you are going to create a temporary table called allcustomers that is based on an original table customers.

Here’s the MySQL statement you need to use to create the temporary table:

CREATE TEMPORARY TABLE allcustomers SELECT * FROM customers

Now, let’s compare the results of two queries:

select * from customers

and:

select * from allcustomers

Figure 8.11 shows the results of the first query.

Figure 8.11 Results of the Query select * from customers

Figure 8.12 shows the results of the second query.

Figure 8.12 Results of the Query select * from allcustomers

Now, let’s look at all the tables in the database by leveraging the following command:

show tables

Figure 8.13 shows the output of this command.

Figure 8.13 Results of the show tables Command

As you can see, the customers table shows up, but the temporary table you created (allcustomers) doesn’t.

Remember that temporary tables enable you to keep temporary data; in addition, you can reuse these tables several times in a particular session. Like other SQL applications, MySQL deletes temporary tables automatically when the current session is closed or the database connection is terminated.

Consider the following points about temporary tables:

There are two types of temporary tables that you can create in Microsoft SQL:

Notice that for a global temporary table you used ## in front of the table name whereas for a local temporary table you used # in front of the table name.

Subsets of Records

A subset is essentially a smaller set of data from a larger database or data warehouse that you can use to focus on only the relevant information. Database subsetting involves creating referentially intact portions of records from larger datasets into a separate database. For example, say that you are working with a database that has hundreds of records on customers from around the globe. If you are interested only in the customers in Europe, you can work with only the subset of data that pertains to the customers based in Europe.

Figure 8.14 illustrates the idea of data subsetting.

Figure 8.14 Data Subsetting Overview

There are several reasons you might want to work with subsets of data or records:

As you can see, there are a few really good reasons to leverage data subsetting, including improving performance, allowing parallel tasks at lower infrastructure costs, and taking advantage of abstraction. Moreover, subsetting gives developers and data engineers freedom to do their tests against subsets of data rather than whole databases. Data subsetting is an important aspect that leads to effective data quality management.

How exactly do you perform data subsetting? Well, two of the most famous methods are data sharding and data partitioning. Both allow you to break up data into smaller, more manageable slices, improving the performance of reads.

Data sharding, which is very common in scalable databases, allows shards (that is, logical slices of data) to be stored on multiple compute nodes. In contrast, data partitioning allows you to share shards on one compute node. Data sharding allows you to store new logical slices of data, called shards, across multiple nodes to achieve horizontal scalability and improved performance. When creating shards, you are essentially creating schema replicas and dividing the data stored in each shard based on a shard key.

Sharding is done horizontally, and partitioning is done vertically, as shown in Figure 8.15.

Sharding and partitioning of data both enable you to create data subsets. In the example shown in Figure 8.15, partitioning allows you to focus on CustomerID and customer details as well as product relationships, and sharding allows you to segregate customers by number of products.

Figure 8.15 Data Sharding and Partitioning

What Next?

If you want more practice on this chapter’s exam objective before you move on, remember that you can access all of the Cram Quiz questions on the Pearson Test Prep software online. You can also create a custom exam by objective with the Online Practice Test. Note any objective you struggle with and go to that objective’s material in this chapter.

800 East 96th Street, Indianapolis, Indiana 46240

vceplus-200-125    | boson-200-125    | training-cissp    | actualtests-cissp    | techexams-cissp    | gratisexams-300-075    | pearsonitcertification-210-260    | examsboost-210-260    | examsforall-210-260    | dumps4free-210-260    | reddit-210-260    | cisexams-352-001    | itexamfox-352-001    | passguaranteed-352-001    | passeasily-352-001    | freeccnastudyguide-200-120    | gocertify-200-120    | passcerty-200-120    | certifyguide-70-980    | dumpscollection-70-980    | examcollection-70-534    | cbtnuggets-210-065    | examfiles-400-051    | passitdump-400-051    | pearsonitcertification-70-462    | anderseide-70-347    | thomas-70-533    | research-1V0-605    | topix-102-400    | certdepot-EX200    | pearsonit-640-916    | itproguru-70-533    | reddit-100-105    | channel9-70-346    | anderseide-70-346    | theiia-IIA-CIA-PART3    | certificationHP-hp0-s41    | pearsonitcertification-640-916    | anderMicrosoft-70-534    | cathMicrosoft-70-462    | examcollection-cca-500    | techexams-gcih    | mslearn-70-346    | measureup-70-486    | pass4sure-hp0-s41    | iiba-640-916    | itsecurity-sscp    | cbtnuggets-300-320    | blogged-70-486    | pass4sure-IIA-CIA-PART1    | cbtnuggets-100-101    | developerhandbook-70-486    | lpicisco-101    | mylearn-1V0-605    | tomsitpro-cism    | gnosis-101    | channel9Mic-70-534    | ipass-IIA-CIA-PART1    | forcerts-70-417    | tests-sy0-401    | ipasstheciaexam-IIA-CIA-PART3    | mostcisco-300-135    | buildazure-70-533    | cloudera-cca-500    | pdf4cert-2v0-621    | f5cisco-101    | gocertify-1z0-062    | quora-640-916    | micrcosoft-70-480    | brain2pass-70-417    | examcompass-sy0-401    | global-EX200    | iassc-ICGB    | vceplus-300-115    | quizlet-810-403    | cbtnuggets-70-697    | educationOracle-1Z0-434    | channel9-70-534    | officialcerts-400-051    | examsboost-IIA-CIA-PART1    | networktut-300-135    | teststarter-300-206    | pluralsight-70-486    | coding-70-486    | freeccna-100-101    | digitaltut-300-101    | iiba-CBAP    | virtuallymikebrown-640-916    | isaca-cism    | whizlabs-pmp    | techexams-70-980    | ciscopress-300-115    | techtarget-cism    | pearsonitcertification-300-070    | testking-2v0-621    | isacaNew-cism    | simplilearn-pmi-rmp    | simplilearn-pmp    | educationOracle-1z0-809    | education-1z0-809    | teachertube-1Z0-434    | villanovau-CBAP    | quora-300-206    | certifyguide-300-208    | cbtnuggets-100-105    | flydumps-70-417    | gratisexams-1V0-605    | ituonline-1z0-062    | techexams-cas-002    | simplilearn-70-534    | pluralsight-70-697    | theiia-IIA-CIA-PART1    | itexamtips-400-051    | pearsonitcertification-EX200    | pluralsight-70-480    | learn-hp0-s42    | giac-gpen    | mindhub-102-400    | coursesmsu-CBAP    | examsforall-2v0-621    | developerhandbook-70-487    | root-EX200    | coderanch-1z0-809    | getfreedumps-1z0-062    | comptia-cas-002    | quora-1z0-809    | boson-300-135    | killtest-2v0-621    | learncia-IIA-CIA-PART3    | computer-gcih    | universitycloudera-cca-500    | itexamrun-70-410    | certificationHPv2-hp0-s41    | certskills-100-105    | skipitnow-70-417    | gocertify-sy0-401    | prep4sure-70-417    | simplilearn-cisa    |
http://www.pmsas.pr.gov.br/wp-content/    | http://www.pmsas.pr.gov.br/wp-content/    |