'MySQL News & Article'에 해당되는 글 8건

  1. [MySQL] MYSQL QUERY PERFORMANCE STATISTICS IN THE PERFORMANCE SCHEMA
  2. [MySQL] Optimizer Enhancements in MySQL 5.7
  3. [MySQL] XFS and EXT4 Testing Redux
  4. [MySQL] Replace Oracle RAC with MariaDB Galera Cluster?
  5. [MySQL] Optimizer Enhancements in MySQL 5.7
  6. [MySQL] Streaming backups onto the Cloud Storage
  7. [MySQL] Oracle 인가? MySQL 인가?
  8. [MySQL] 요즘 MySQL은?





We’ve recently added the ability to monitor MySQL query performance statistics from MySQL’s PERFORMANCE_SCHEMA, and there were a number of lessons learned. There are definitely right and wrong ways to do it. If you are looking to the P_S tables for monitoring MySQL query performance, this blog post might save you some time and mistakes.

What Is The Performance Schema?

First, a quick introduction. The Performance Schema includes a set of tables that give information on how statements are performing. Most of the P_S tables follow a set of predictable conventions: there’s a set of tables with a limited set of full-granularity current and/or historical data, which is aggregated into tables that accumulate over time. In the case of statements, there’s a table of current statements, which feeds into a statement history, that accumulates into statement summary statistics. The tables are named as follows:

| events_statements_current                          |
| events_statements_history                          |
| events_statements_history_long                     |
| events_statements_summary_by_account_by_event_name |
| events_statements_summary_by_digest                |
| events_statements_summary_by_host_by_event_name    |
| events_statements_summary_by_thread_by_event_name  |
| events_statements_summary_by_user_by_event_name    |
| events_statements_summary_global_by_event_name     |

The tables most people will care about are events_statements_current, which is essentially a replacement for SHOW FULL PROCESSLIST, andevents_statements_summary_by_digest, which is statistics about classes of queries over time. The rest of the tables are pretty much what they look like – summaries by user, etc.

These tables were introduced in MySQL 5.6, and not all of them will be enabled by default. There’s a performance overhead to enable them, but this should be small relative to the performance improvements you can gain from using them.

We prefer our technique of decoding network traffic server-side to measure query performance, for several reasons, but the statement digest table is the next-best thing in cases such as Amazon RDS where that’s not possible. It gives us enough data to present a view of Top Queries as shown below.

Top MySQL Query Performance

Now let’s dig into specifics about these tables and how to use them.

Monitoring MySQL Performance - An Overview

The general idea, for most MySQL performance monitoring tools, is to read fromevents_statements_summary_by_digest at intervals and subtract each sample from the next, to get rates over time. As you can see in the below sample, there are a lot of columns with various statistics about each family of queries in the table:

text
mysql> select * from events_statements_summary_by_digest  limit 1\G
*************************** 1. row ***************************
                SCHEMA_NAME: customers
                     DIGEST: 4625121e18403967975fa86e817d78bf
                DIGEST_TEXT: SELECT @ @ max_allowed_packet 
                 COUNT_STAR: 36254
             SUM_TIMER_WAIT: 2683789829000
             MIN_TIMER_WAIT: 45079000
             AVG_TIMER_WAIT: 74027000
             MAX_TIMER_WAIT: 1445326000
              SUM_LOCK_TIME: 0
                 SUM_ERRORS: 0
               SUM_WARNINGS: 0
          SUM_ROWS_AFFECTED: 0
              SUM_ROWS_SENT: 36254
          SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
     SUM_CREATED_TMP_TABLES: 0
       SUM_SELECT_FULL_JOIN: 0
 SUM_SELECT_FULL_RANGE_JOIN: 0
           SUM_SELECT_RANGE: 0
     SUM_SELECT_RANGE_CHECK: 0
            SUM_SELECT_SCAN: 0
      SUM_SORT_MERGE_PASSES: 0
             SUM_SORT_RANGE: 0
              SUM_SORT_ROWS: 0
              SUM_SORT_SCAN: 0
          SUM_NO_INDEX_USED: 0
     SUM_NO_GOOD_INDEX_USED: 0
                 FIRST_SEEN: 2014-09-12 16:04:38
                  LAST_SEEN: 2014-10-31 08:26:07

These columns are mostly counters that accumulate over time. The COUNT_STARcolumn, for example, shows the number of times the statement has been executed. The SUM_ columns are just what they look like.

Enabling And Sizing The Table

The table needs to be enabled as usual with the Performance Schema by using its setup table, setup_consumers. That table contains a row for each P_S consumer to be enabled. Other setup tables and server variables control some of the configuration as well, though the defaults work OK out of the box most of the time.

The table can also be sized, in number of rows. By default it is 10,000 rows (although I think somewhere I saw a documentation page that said 200).

Limitations Of The Table

There are a couple of limitations you should be aware of.

  1. The statement digest table does not record anything about statements that are prepared. It only captures statements that are executed by sending the full SQL to the server as text. If you use prepared statements, the table probably does not capture your server’s performance accurately. The drivers for many programming languages use prepared statements by default, so this could be a real issue. (If this is a problem for you, you might like to know that VividCortex captures prepared statements from network traffic, including samples).
  2. The table is fixed-size, and resizing it requires a server restart.
  3. Some things aren’t captured in full granularity. For example, when we’re capturing MySQL query performance data from network traffic, we can measure specific error codes. There’s a SUM_ERRORS column in the table, but you can’t see what the error codes and messages were.

Resetting The Table (Or Not)

The table can be reset with a TRUNCATE to start afresh, but generally shouldn’t be. Why would you want to do this? There might be a few reasons.

First, the table is fixed-size, and if the table isn’t large enough to hold all of the distinct types of queries your server runs, you’ll get a catch-all row with a NULL digest and schema. This represents statements that aren’t being tracked separately, and might be important for some reason. A TRUNCATE will empty the table if this is the case.

Second, statistics accumulate over time, so columns such as first-seen and last-seen dates may eventually end up being useless to you. The min, max, and average timer waits will not be very helpful over long periods of time, either.

Finally, you might want to reduce the number of rows it contains, so that occasional queries that are never purged don’t introduce performance overhead when reading the table.

There are tools that do this completely wrong, though. Some of them empty out the table every time they read from it. This is the worst behavior because these tables are not session-specific. They are global, and a TRUNCATE will affect everyone who’s looking at them. It might be kind of rude to constantly throw away the data your colleague (or another tool) is looking at.

The other problem with this is that a tool that reads from the table, then truncates it, is subject to race conditions. Statements that complete between these actions will be discarded and never seen. Of course, there’s no way to avoid this, except by just not doing it, or not doing it often.

I would suggest resetting this table only manually and only when needed, or perhaps at infrequent intervals such as once a day or once an hour, from a scheduled task.

Accumulating Statements Correctly

The table’s primary key isn’t defined in the schema, but there’s a unique set of columns. This is not, contrary to what I’ve seen some software assume, the DIGESTcolumn. There is one row per digest, per schema. The combination of schema and digest is unique.

This means that if you’re looking for all information about a single class of queries regardless of schema, you need to aggregate together all of the rows with the sameDIGEST.

One of the implications of the uniqueness being defined by schema and digest together is that servers that have a large number of schemas and a large number of digests will need a really huge number of rows to keep track of all of the statements. At VividCortex, we have customers whose servers have literally millions or tens of millions of distinct families of queries running on a regular basis. Multiply this by a large number of schemas, and you have no hope of keeping track of them with the P_S tables. This is not a problem for our default collection mechanism, though: by default we capture MySQL query performance statistics by decoding the server’s network traffic. This handles high-cardinality scenarios without trouble.

Don’t Run GROUP BY On The Table Frequently

There are several ways you can cause performance impact to the server by reading from the P_S tables.

One is by using complex queries on these tables. They are in-memory, but they’re not indexed. For example, if you run a GROUP BY to aggregate rows together by digest, you’ll cause trouble. You probably shouldn’t do this, at least not frequently.

Recall that VividCortex measures everything at 1-second resolution, giving you highly detailed performance statistics about your entire system. The statement statistics are no different; we have per-second statement (query) statistics. Reading from the P_S table once per second with a GROUP BY clause has too much performance impact on the server. It is less costly to read the entire table and accumulate the statistics in application code, in our tests.

Don’t Re-Fetch Data

Another way to cause problems is to fetch the DIGEST_TEXT column with every query. This column isn’t enormous, because it’s limited to 1kb in length, but it’s still large enough that you should not repeatedly fetch it. Instead, when you see an unknown digest, you should query for it only then. This may introduce a lot of complexity into your application code, but this is what needs to be done.

Handle Results Smartly

We care a lot about performance at VividCortex, obviously. We’ve built our approach to performance management with minimal overhead in mind. This is why we don’t do things like enable your server’s slow query log or poll commands that can block the server. But in addition to avoiding overhead on the server, we have to make our agent’s performance good, too.

This is why we also do things you might consider to be extreme. For example, MySQL returns all of the numbers from the Performance Schema in textual format over the network, and these then have to be converted into numbers by the client. Even ascii-to-number conversion has a cost we can measure, so we don’t do it unless theCOUNT_STAR column has changed since the last time we saw a row. This makes a material difference in the agent’s CPU consumption.

Capturing Sample Statements

Capturing samples of queries is very helpful. Aggregate statistics about groups of queries aren’t revealing enough; you need to be able to look at specific instances of queries to EXPLAIN them and so on.

To get samples, you’ll need to look at the current or historical statement tables. Not all of these are enabled by default, though.

Some tools LEFT JOIN against the statement history tables to get samples of individual query executions. This obviously should not be done at 1-second frequency. Even if it’s done infrequently, it’s not really a great idea. When you have fine-detailed performance instrumentation you can really see small spikes in server performance, and an occasionally intrusive query can potentially starve high-frequency fast-running queries of resources.

VividCortex’s approach to this, by the way, is to collect samples probabilistically, which is different from the usual practice of trying to find a “worst” sample. Worst-sample is okay in some ways, but it is not representative, so it doesn’t help you find out much about a broad spectrum of queries and their execution. Here’s a screenshot of what our sampling approach yields, which is quite different from the picture you’ll get from worst-sample tactics:

samples

Conclusions

Although we prefer to be able to capture and decode network traffic to see the full detail about what’s happening inside the server, in cases where that’s not possible, the Performance Schema in MySQL 5.6 and greater is a good alternative. There are just a few things one should take care to do, at least at 1-second resolution as we do at VividCortex. And there are a few common mistakes you can stumble over that will either be bad behavior or might make your results just plain wrong.

If you have suggestions, comments, or questions, please leave them below!

저작자 표시 비영리 동일 조건 변경 허락
신고





The MySQL optimizer is getting better. MySQL 5.6 introduced:

  • File sort optimizations with small limit
  • Index Condition Pushdown
  • Batched Key Access and Multi Range Read
  • Postponed Materialization
  • Improved Subquery execution
  • EXPLAIN for Insert, Update, and Delete
  • Optimizer Traces
  • Structured EXPLAIN in JSON format

This was in addition to the InnoDB storage engine now offering improved statistics collection, leading to more stable query plans.

In Evgeny Potemkin's session at MySQL Connect titled "MySQL's EXPLAIN Command New Features", two new features for 5.7 were announced. They are both incredibly useful, so I wanted to write a little about them.

EXPLAIN FOR CONNECTION

Normally with EXPLAIN, what you would be doing is finding the execution plan of a query you are intending to run, and then interpreting the output how you see fit.

What MySQL 5.7 will do, is give you the ability to see the execution plan of a running query in another connection. i.e.

EXPLAIN FORMAT=JSON FOR CONNECTION 2;

Why it's useful:
* Plans can change depending on input parameters. i.e. WHERE mydate BETWEEN '2013-01-01' and '2013-01-02' may use an index, but WHERE mydate BETWEEN '2001-01-01' and '2013-10-17' may not.
* Plans can change as data changes.
* Plans can also change depending on the context of a transaction, with InnoDB offering multi-version concurrency control.
* Optimizer statistics can change, and it's not impossible that the reason for the executing query being slow has something to do with it. It's great to have conclusive proof and be able to rule this out.

Execution cost in EXPLAIN

MySQL uses cost based optimization to pick the best query execution plan when there are multiple choices available. It is very similar to how a GPS navigator adds up estimated time and picks the best route to a destination.

What this feature does is exposes the cost as a numeric value when running EXPLAIN FORMAT=JSON. To take an example using the world sample database:

mysql [localhost] {msandbox} (world) > EXPLAIN FORMAT=JSON SELECT City.* 
FROM City INNER JOIN Country ON City.countrycode=Country.code 
ORDER BY City.NAME ASC LIMIT 100\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4786.00"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2151.00"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "country",
            "access_type": "index",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "Code"
            ],
            "key_length": "3",
            "rows_examined_per_scan": 239,
            "rows_produced_per_join": 239,
            "filtered": 100,
            "using_index": true,
            "cost_info": {
              "read_cost": "6.00",
              "eval_cost": "47.80",
              "prefix_cost": "53.80",
              "data_read_per_join": "61K"
            },
            "used_columns": [
              "Code"
            ]
          }
        },
        {
          "table": {
            "table_name": "City",
            "access_type": "ref",
            "possible_keys": [
              "CountryCode"
            ],
            "key": "CountryCode",
            "used_key_parts": [
              "CountryCode"
            ],
            "key_length": "3",
            "ref": [
              "world.country.Code"
            ],
            "rows_examined_per_scan": 9,
            "rows_produced_per_join": 2151,
            "filtered": 100,
            "cost_info": {
              "read_cost": "2151.00",
              "eval_cost": "430.20",
              "prefix_cost": "2635.00",
              "data_read_per_join": "151K"
            },
            "used_columns": [
              "ID",
              "Name",
              "CountryCode",
              "District",
              "Population"
            ]
          }
        }
      ]
    }
  }
}

Why it's useful:

  • This exposes more transparency into optimizer decisions. DBAs can better understand what part of a query is considered expensive, and try to optimize. I think this is important, because I have heard a lot of DBAs make blanket recommendations like "joins are bad" or "sorting is bad", but there needs to be context on how much data needs to be sorted. It makes us all speak the same language: estimated cost.
  • Cost refinement is an ongoing effort. As well as the introduction of new fast SSD storage, MySQL is introducing new optimizations (such as index-condition pushdown). Not all of these optimizations will be the best choice every time, and MySQL should ideally be able to make a right choice for all situations.


저작자 표시 비영리 동일 조건 변경 허락
신고





In my concluded testing post, I declared EXT4 my winner vs XFS for my scenario. My coworker,@keyurdg, was unwilling to let XFS lose out and made a few observations:

  • XFS wasn’t *really* being formatted optimally for the RAID stripe size
  • XFS wasn’t being mounted with the inode64 option which means that all of the inodes are kept in the first 2TB. (Side note: inode64 option is default in newer kernels but not on CentOS 6’s 2.6.32)
  • Single threaded testing isn’t entirely accurate because although replication is single threaded, the writes are collected in InnoDB and then writes it to disk using multiple threads governed by innodb_write_io_threads.

Armed with new data, I have – for real – the last round of testing.

To keep things a bit simpler, I will be comparing each file system on 2TB and 27TB, with 4 threads, which matches the default value for innodb_write_io_threads in MySQL 5.5.

FSRAIDSizeMount OptionsTransfer/sRequests/sAvg/Request95%/Request
xfs102Tnoatime,nodiratime,nobarrier,inode6462.588Mb/sec4005.660.88ms0.03ms
ext4102Tnoatime,nodiratime,nobarrier58.667Mb/sec3754.660.87ms0.19ms
FSRAIDSizeMount OptionsTransfer/sRequests/sAvg/Request95%/Request
xfs1027Tnoatime,nodiratime,nobarrier,inode6464.47Mb/sec4126.060.84ms0.02ms
ext41027Tnoatime,nodiratime,nobarrier49.379Mb/sec3160.261.06ms0.24ms

XFS finally wins out clearly over EXT4. XFS being dramatically slower on 27T earlier really shows how much the worse the performance between inode32 and inode64 is and explains why it was that much better on 2T. Fixing the formatting options pushed XFS over the top easily.

All that’s left to do is setup multiple instances until replication can’t keep up anymore.

저작자 표시 비영리 동일 조건 변경 허락
신고





ERKANYANAR

If you want to avoid downtimes in your business, High Availabilty (HA) is a strong requirement which, by definition, makes it possible to access your data all the time without losing (any) data. In this blog we compare two alternatives: Oracle RAC and MariaDB Galera Cluster. 

There are several options to implement High Availability. Oracle RAC is a popular and proven HA solution. HA can also be enabled for your data and systems with loadbalancers that make it possible to always access your data. MariaDB Galera Cluster provides similar functionality using synchronous multi-master Galera replication. It is also easier to build and proves to be more cost-effective. Being OpenSource, you may have to pay for support, but not for running the system.

Next, the designs of Oracle RAC and MariaDB Galera Cluster are going to be compared, so you can make up your mind on your own.

Oracle RAC

With RAC, Oracle instances run on separate nodes, while the data is located on shared storage. All instances access the same files.

To prevent conflicts, the instances must agree on which instance is actually working on a block of data. If a node wants to change a row, it must get exclusive access to that block and store it in its cache. It therefore asks the other nodes whether they have the block. If no other node does, it gets the block from storage.

Even in case of read-access data all the nodes need to communicate this way to get the data as it is done for writing. When the block is modified, the requesting nodes get a consistent read version (which they are not allowed to modify) from the block. This adds latency due to internode communication - there will be read and write access every time a node does not have the block.

The need for communication between the nodes for every access on a table adds overhead. On the other hand, having all blocks advance local locks on a node, e.g. for SELECT FOR UPDATE, are cluster wide locks.

The advantage of RAC is that losing an Oracle node does not harm the service at all. The other nodes will keep providing data (HA of the access). Therefore, you can shut down a node to perform maintenance tasks such as upgrading hardware or software, while reducing unexpected downtime. However, the shared storage - responsible for the data - is a potential single point of failure.

On Oracle RAC distributing read or write access is not optimal because latency is added by additional internode round trips. The best results occur when the application only accesses a fixed part of the data per node, so that no blocks have to be moved around, but it makes the setup more complicated.

MariaDB Galera Cluster

In contrast to Oracle RAC, MariaDB Galera Cluster is a high availability setup with shared-nothing architecture. Instead of having one shared storage (SAN or NAS), every cluster member has its own copy of all the data, thus eliminating the single point of failure.

MariaDB Galera Cluster take care about syncing data even for new nodes. This makes managing the cluster easy, as adding an empty node into the cluster is sufficient. MariaDB Galera Cluster will provide all data for the new node.

Unlike Oracle RAC, accessing a node to read data does not result in internode communication. Instead, communication (and so latency) happens at the time transactions are committed. This is faster than the Oracle RAC approach of acquiring all blocks in advance, but this also means conflicts are found at the time a transaction is committed.

And conflict are found by the internode communication because of the commit. Thats why the same data should not be accessed (at least not at the same time) on different nodes, as this increases the chance of having conflicts. This will not happen when the data is accessed on different nodes one after another. In the case of Oracle RAC the blocks would have to be copied.

This means that a SELECT FOR UPDATE statement is able to fail on commit, as it locks the data locally but not cluster wide. So conflicts with transactions on other nodes can only be found at the time of the commit. That is why the same data should not be accessed at the same time on different nodes, as it increases the chance of having conflicts. This is slightly different to Oracle RAC where accessing data on another node any time later does move the blocks.

While Oracle RAC has a lot of latency moving data blocks into the cache of every node, MariaDB Galera Cluster has an increased likelihood of failing commits.

Like Oracle RAC, single nodes in a MariaDB Galera Cluster can be taken down for maintenance without stopping the cluster. When a node rejoins the cluster, it automatically gets missing transactions via Incremental State Transfer (IST), or it may sync all data using State Snapshot Transfer (SST). If the missing transactions are in a local (configurable) cache of a node, IST is used, if not SST is used.

One drawback of the current Galera version is that Data Definition Language (DDL) commands (CREATE, ALTER, DROP) are run synchronously on the cluster. Therefore the entire cluster stalls until a DDL command finishes. Thats why Magento installations running default configuration do not scale at all on MariaDB Galera Cluster. In general using tools like pt-online-schema-change bypass this limitation. Eliminating this limitation is on the development roadmap.

In comparison

Oracle RAC and MariaDB Galera Cluster provide similar functionality using different designs. Each one is eliminating maintenance downtime for many tasks and thus gives you more freedom to run applications.

In general Oracle RAC has a lot more latency because of internode communication (including moving all requested data blocks) for read and write access. In MariaDB Galera Cluster the changed dataset is sent around by committing. So only changed datasets are sent.

Despite the obvious similarities, the two databases have quite different architectures. Oracle RAC uses shared storage, while MariaDB Galera Cluster uses a shared-nothing architecture, which is less expensive. Oracle RACs shared storage is quite expensive. The author has observed EMC or NetApp for that, as it is the single point of failure something reliable is needed.

Data on MariaDB Galera Cluster is replicated on all the nodes, which makes it easy to run the cluster spread over different regions. Consequently, your data will be safe even if your datacenter burns down. To have this level of redundancy with Oracle RAC you need a shared storage accordingly, i.e. a Netapp MetroCluster. Beside adding more costs, Netapp MetroCluster requires a network with a round trip latency of less than 10ms, while MariaDB Galera Cluster even runs in Cloud environments in different regions.

With Oracle RAC there are two inherent sources of latency: accessing the shared storage and internode communication for read and write access. While in MariaDB Galera Cluster there is latency for every COMMIT needed by the internode communication to check and send the data to be committed.

Of course MariaDB Galera Cluster is no one-to-one replacement for Oracle RAC. But if your application runs with either Oracle or MySQL/MariaDB, MariaDB Galera Cluster is more than an alternative.

Further reading

About the Author

erkanyanar's picture
erkan yanar

Erkan Yanar is an independent consultant with strong focus on MySQL, Docker/LXC and OpenStack. He loves to give presentations and do also writes for trade magazines.



저작자 표시 비영리 동일 조건 변경 허락
신고





The MySQL optimizer is getting better. MySQL 5.6 introduced:

  • File sort optimizations with small limit
  • Index Condition Pushdown
  • Batched Key Access and Multi Range Read
  • Postponed Materialization
  • Improved Subquery execution
  • EXPLAIN for Insert, Update, and Delete
  • Optimizer Traces
  • Structured EXPLAIN in JSON format

This was in addition to the InnoDB storage engine now offering improved statistics collection, leading to more stable query plans.

In Evgeny Potemkin's session at MySQL Connect titled "MySQL's EXPLAIN Command New Features", two new features for 5.7 were announced. They are both incredibly useful, so I wanted to write a little about them.

EXPLAIN FOR CONNECTION

Normally with EXPLAIN, what you would be doing is finding the execution plan of a query you are intending to run, and then interpreting the output how you see fit.

What MySQL 5.7 will do, is give you the ability to see the execution plan of a running query in another connection. i.e.

EXPLAIN FORMAT=JSON FOR CONNECTION 2;

Why it's useful:
* Plans can change depending on input parameters. i.e. WHERE mydate BETWEEN '2013-01-01' and '2013-01-02' may use an index, but WHERE mydate BETWEEN '2001-01-01' and '2013-10-17'may not.
* Plans can change as data changes.
* Plans can also change depending on the context of a transaction, with InnoDB offeringmulti-version concurrency control.
* Optimizer statistics can change, and it's not impossible that the reason for the executing query being slow has something to do with it. It's great to have conclusive proof and be able to rule this out.

Execution cost in EXPLAIN

MySQL uses cost based optimization to pick the best query execution plan when there are multiple choices available. It is very similar to how a GPS navigator adds up estimated time and picks the best route to a destination.

What this feature does is exposes the cost as a numeric value when running EXPLAIN FORMAT=JSON. To take an example using the world sample database:

mysql [localhost] {msandbox} (world) > EXPLAIN FORMAT=JSON SELECT City.* 
FROM City INNER JOIN Country ON City.countrycode=Country.code 
ORDER BY City.NAME ASC LIMIT 100\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4786.00"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2151.00"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "country",
            "access_type": "index",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "Code"
            ],
            "key_length": "3",
            "rows_examined_per_scan": 239,
            "rows_produced_per_join": 239,
            "filtered": 100,
            "using_index": true,
            "cost_info": {
              "read_cost": "6.00",
              "eval_cost": "47.80",
              "prefix_cost": "53.80",
              "data_read_per_join": "61K"
            },
            "used_columns": [
              "Code"
            ]
          }
        },
        {
          "table": {
            "table_name": "City",
            "access_type": "ref",
            "possible_keys": [
              "CountryCode"
            ],
            "key": "CountryCode",
            "used_key_parts": [
              "CountryCode"
            ],
            "key_length": "3",
            "ref": [
              "world.country.Code"
            ],
            "rows_examined_per_scan": 9,
            "rows_produced_per_join": 2151,
            "filtered": 100,
            "cost_info": {
              "read_cost": "2151.00",
              "eval_cost": "430.20",
              "prefix_cost": "2635.00",
              "data_read_per_join": "151K"
            },
            "used_columns": [
              "ID",
              "Name",
              "CountryCode",
              "District",
              "Population"
            ]
          }
        }
      ]
    }
  }
}

Why it's useful:

  • This exposes more transparency into optimizer decisions. DBAs can better understand what part of a query is considered expensive, and try to optimize. I think this is important, because I have heard a lot of DBAs make blanket recommendations like "joins are bad" or "sorting is bad", but there needs to be context on how much data needs to be sorted. It makes us all speak the same language: estimated cost.
  • Cost refinement is an ongoing effort. As well as the introduction of new fast SSD storage, MySQL is introducing new optimizations (such as index-condition pushdown). Not all of these optimizations will be the best choice every time, and MySQL should ideally be able to make a right choice for all situations.


저작자 표시 비영리 동일 조건 변경 허락
신고





MySQL Enterprise Backup(MEB) has been widely used as the most efficient tool to take backups of huge databases. The storage of backups can be done on-premises – viz, local hard disk, external hard disk drives, network mounted disk. MEB can also stream backups to tape by supporting different Media Management Software like Symantec NetBackup, EMC2 and Oracle Secure Backup. Streaming of backups can also be done using 'ssh' to another machine.

As data keeps growing, instead of expanding the storage, database users are choosing to rely on the growing trend which is the Cloud Storage.

A few facts about Cloud Storage:

1. Offers off-premises storage with AAA(Anytime, Anywhere Access)
2. Sharing of data to make a collaborative effort.
3. Highly fault tolerant, durable and scalable and secure.
4. Cost-effective 

To the benefit of our MySQL database users, Mysql Enterprise Backup (MEB) has come up with the support for taking their backups and storing directly on the cloud.

MEB 3.10.2 provides this great feature of storing image backups on the cloud with its simplified command-line options. The backups can be restored from the cloud as well.

How does it work?


    MEB streaming backup from Remote Machine /  EC2 Instance to Amazon S3       

As illustrated in the picture, MEB can be hosted along with Server either on a remote machine or on the Amazon EC2 instance to take backup of MySQL databases/tables which will be stored in Amazon S3.

The list of cloud options introduced are:

  • --cloud-service : Cloud service for data backup or restoration. Currently, only the Amazon S3 service is supported, and “s3” is the only value mysqlbackup accepts for this option.
  • --cloud-bucket : The storage bucket on Amazon S3 for the backup image.
  • --cloud-object-key : The Amazon S3 object key for the backup image.
  • --cloud-access-key-id : AWS access key ID for logging onto Amazon S3.
  • --cloud-secret-access-key : AWS secret access key that goes with the AWS access key id used in --cloud-access-key-id.
  • --cloud-aws-region : Region for Amazon Web Services that mysqlbackup accesses for S3.
  • --cloud-trace : Print trace information for cloud operations. It works independently of --trace, which specifies the trace level for the non-cloud operations of mysqlbackup. Any non-zero value for the option enables the trace function. Default value is “0.”
  • --cloud-proxy=proxy-url:port : Proxy address and port number for overriding the environment's default proxy settings for accessing Amazon S3.


Lets dive into the MEB command-line with the options to be used for cloud storage.

For taking a simple image backup of the whole server:

./mysqlbackup --cloud-service=s3 --cloud-bucket=mebtest --cloud-object-key=cloud_image.bi --cloud-access-key-id=accessKeyId --cloud-secret-access-key=******** --cloud-aws-region=awsRegion --backup-dir=/backuptmp --backup-image=- backup-to-image  

For taking a image backup of selective tables:

This is very useful to allow access to specific table(s) to other users. As you are aware, the users can restore these tables onto their running server. 

./mysqlbackup --cloud-service=s3 --cloud-bucket=mebtest --cloud-object-key=cloud_image.bi --cloud-access-key-id=accessKeyId --cloud-secret-access-key=******** --cloud-aws-region=awsRegion –include-tables=db1.tab1 --use-tts=with-full-locking --backup-dir=/backuptmp --backup-image=- backup-to-image  

MEB also supports compressed, encrypted and incremental image backups onto the cloud.

For example, to perform an encrypted image backup, the command-line is:

./mysqlbackup --cloud-service=s3 --cloud-bucket=mebtest --cloud-object-key=cloud_image.bi --cloud-access-key-id=accessKeyId --cloud-secret-access-key=******** --cloud-aws-region=awsRegion –encrypt –key-file=encryptKeyFile --backup-dir=/backuptmp --backup-image=- backup-to-image  

All operations supported for non-cloud image backups can be performed. 
Here goes the list:

extract
image-to-backup-dir
validate
copy-back-and-apply-log

The image stored on the cloud can be directly restored onto a (running) server.

For restoring the selective image backup into a running server:

./mysqlbackup –defaults-file=/backuptmp/backup-my.cnf --cloud-service=s3 --cloud-bucket=mebtest --cloud-object-key=cloud_image.bi --cloud-access-key-id=accessKeyId --cloud-secret-access-key=******** --cloud-aws-region=awsRegion --backup-dir=/restoretmp --backup-image=- --datadir=/datadir copy-back-and-apply-log  
Note: This is the image backup taken with –use-tts option.

For more options and usage samples, please refer MEB Manual.

Thus MEB helps the user to take backups and stream it to cloud using simple command-line WITHOUT the need for any external tool or script. 

저작자 표시 비영리 동일 조건 변경 허락
신고





기업에서 데이터베이스를 선정하는 기준은 여러가지가 있습니다.

하지만, 데이터베이스 선정 기준 중 하나로 높은 순위를 차지하는 비용일 것입니다.

MySQL은 Oracle과 비교하여 보면 매우 저렴한 가격으로 고객들에게 다가가고 있습니다.


실직적으로, 데이터베이스의 직접적인 비교는 올바르지 않습니다.

그 이유 중 처음으로는 Oracle과 MySQL은 서로 다른 시장을 구축하고 있습니다.

오라클은 충분한 큰 예산과 복잡한 비즈니스 요구와 기업 고객을 위해 설계되었습니다.

그에 반해, MySQL은 가장 일반적으로 데이터베이스 기반 웹 사이트 및 Non-Critical 애플리케이션에 사용되는 저가의 데이터베이스입니다.

동일한 기능이라 하더라도 서로 다르게 구현될 가능성이 많습니다. (예 : 행 수준의 Lock, Lock 알고리즘 등)

행 수준의 Lock은 행이 업데이트되는 동안되는 데이터에 액세스하는 다른 사용자를 방지하기 위해 쓰기 위해 잠겨 있습니다. 오라클과 여러 잠금에 대한 추가 오버 헤드가 없습니다. 

반면에 MySQL은 오버 헤드 잠금의 수에 올라갑니다. 그래서 MySQL은 잠금을 메모리 내 데이터 구조를 사용합니다.


오라클의 특징은 무엇일까요?

오라클의 특징이자 장점은 크게 5가지로 구분하고 있습니다.


1. Oracle Management Server

 - 중앙 집중 방식으로 Administration monitoring이 가능하고, Multiple databases를 튜닝 가능합니다.

 - 다른 Admin User들과 공유가 가능합니다.

2. Oracle Change Manager

 - 변경 Plan을 작성하고 실제 구현하기 전에 변경 사항의 효과를 볼 수 있습니다.

 - 생산 시스템을 방해하지 않습니다.

3. Administrative Alerts

 - 오류가 발생하면 오라클은 이메일이나 설정되어 있는 계정으로 연락을 줄수 있습니다.

 - 경고는 예정된 가동 정지 시간 동안 차단 될 수 있습니다. 

4. Capacity Planning

 - 업그레이드 관리자의 계획을 돕기 위해 사용 패턴을 추적할 수 있습니다.

 - 병목 현상을 쉽게 파악 할 수 있습니다.

5. Query Optimizer

 - 쿼리 최적화 프로그램으로 오라클은 SQL문을 실행하는 가장 효율적인 방법을 선택합니다.

 - Cost 비용을 최소화하기 위해 테이블과 인덱스를 분석 합니다. (Oracle 10g 이상부터는 Cost_Base)


MySQL의 특징은 무엇일까요?


1. 제일 큰 특징으로는 사용하기가 타 DBMS보다 쉽습니다.

2. PHPMyAdmin 같은 비용이 무료인 GUI 툴이 많습니다.

3. 매우 적은 오버헤드를 사용합니다.

 - MySQL은노트북에 단지 1Mb의 RAM만 사용합니다.

 - 오라클 9i를 설치하는 경우 128Mb를 사용합니다.

4. 고급기능을 지원하기 시작하였습니다.

 - Stored Procedures, Triggers, View, Sub-Queries, Transactional Table, Cascading Update & Delete

 - 타사 InnoDB Storage Engine을 사용할 수 있습니다.


회사의 규모에 따라 오라클 또는 MySQL DBMS를 선정하게 됩니다.

만약 마이그레이션 작업을 하더라도 Oracle Migration Kit을 제공하고 있습니다. 

또한, 두 가지의 DBMS를 혼합하여 사용하는 경우도 있습니다.

기술이 특징들 외에도 Syntax, Timestamps, Concatenation 등의 차이가 있습니다.

http://troels.arvin.dk/db/rdbms/ 사이트에서 확인해 보실 수 있습니다.


어느 데이터베이스가 더욱 알맞을까요?

Non-Critical 환경, 매력적인 가격 포인트를 원한다면 MySQL을 선택하겠지만 견고한 신뢰성, 안정적인 기능 등을 원한다면 Oracle을 선택할 수 있습니다. 

물론, 그게 자신이 감당할 수 있는 작업에 따라 달라집니다.



저작자 표시 비영리 동일 조건 변경 허락
신고

[MySQL] 요즘 MySQL은?




요즘 토픽은 데이터베이스 시장입니다.

2010년 이후에 접어들면서 급격한 데이터의 양이 많아지면서 빅데이터가 화두를 잇고 있습니다.

그 이유로 인해 오픈소스인 NoSQL 역시 SQL의 발견 이후로 최대의 이슈화가 되어가고 있습니다.

이미 많은 사람들이 알고 있듯이 MySQL은 오픈소스이지만 점점 오픈소스의 명맥을 유지하지 못하고 있습니다.


MySQL은 1979년 스웨덴의 TcX라는 회사의 터미널 인터페이스 라이브러리인 UNIREG로부터 시작됩니다.

그 후 2000년 TcX에서 MySQL을 개발한 중심 인물(몬티와 데이빗)이 MySQL AB라는 회사로 독립함과 동시에 

FPL(Free Public License) 라이선스 정책으로 바뀌고, 최종 2006년 현재와 같은 라이선스 정책을 취득하게 되었습니다.


MySQL은 '엔터프라이즈 버전''MySQL 커뮤니티 버전' 두가지며,

별도의 라이선스 없이 사용가능한 버전은 'MySQL 커뮤니티 버전' 입니다.

결국 완전한 오픈소스가 아닌 무늬만 오픈소스가 되었습니다.


두 버전은 소스코드는 동일하고 얼마나 잦은 릴리즈가되는가 패치가 되는 정도 였습니다.

그런데, 2011년 2월 MySQL 5.5 GA(General Available) 버전부터는 엔터프라이즈 버전의 소스코드가 

유료 사용자에게도 비공개 되었습니다.


오픈소스로서 Sun에 MySQL을 인수할 당시까지는 오픈소스계의 대부로 손꼽혔었습니다.

하지만, 막강한 M&A 정책을 벌인 Oracle사가 Sun을 인수하면서부터 달라졌습니다.

여러 전문가들 컬럼에 기재되었지만, 점점 MySQL은 오픈소스로서의 색이 엷어지고 있다는 점입니다.


Oracle사의 대규모 M&A와 동시에 MySQL을 사용하던 수많은 오픈소스 유저들은 방향을 MySQL에서 MariaDB로 이동하고 있습니다.

 MariaDB는 MySQL을 개발한 중심 인물인 몬티 와이드니우스가 나와 새롭게 만들어 발전하고 있는 DBMS입니다.


Oracle에 정체되어 있는 MySQL보다 이미 성능면에서는 앞서가고 있으며,

MySQL에서 마이그레이션 작업도 개발 코드 및 DB 테이블 변경없이 호환이 가능합니다.


주요 Linux 배포판 업체들이 MySQL 대신 MariaDB를 더욱 선호하고 변경해나가고 있는 추세입니다.

심지어, 위키디피아구글 같은 대형 웹사이트들도 MariaDB로 교체하기 시작했습니다. 


앞으로의 오픈소스와 더불어 MySQL, MariaDB 등 관심있게 지켜볼 부분입니다.



저작자 표시 비영리 동일 조건 변경 허락
신고