Wednesday, May 6

MySQL Data Types

in this tutorial, you will learn about MySQL data types and how to use them effectively in the MySQL database design.
Database table contains multiple columns with specific data types such as numeric or string. MySQL provides more data types other than just numeric or string. Each data type in MySQL can be determined by the following characteristics:
  • Kind of values it can represent.
  • The space that takes up and whether the values are fixed-length or variable-length.
  • Does the values of the data type can be indexed.
  • How MySQL compares the value of a specific data type.

Numeric Data Types

You can find all SQL standard numeric types in MySQL including exact number data type and approximate numeric data types including integer, fixed-point and floating point. In addition, MySQL also supports BIT data type for storing bit field values. Numeric types can be signed or unsigned except the BIT type. The following table shows you the summary of numeric types in MySQL:
Numeric TypesDescription
TINYINTA very small integer
SMALLINTA small integer
MEDIUMINTA medium-sized integer
INTA standard integer
BIGINTA large integer
DECIMALA fixed-point number
FLOATA single-precision floating-point number
DOUBLEA double-precision floating-point number
BITA bit field

String Data Types

In MySQL, string can hold anything from plain text to binary data such as images and files. String can be compared and searched based on pattern matching by using the LIKE operator or regular expression. The following table shows you the string data types in MySQL:
String TypesDescription
CHARA fixed-length non-binary (character) string
VARCHARA variable-length non-binary string
BINARYA fixed-length binary string
VARBINARYA variable-length binary string
TINYBLOBA very small BLOB (binary large object)
BLOBA small BLOB
MEDIUMBLOBA medium-sized BLOB
LONGBLOBA large BLOB
TINYTEXTA very small non-binary string
TEXTA small non-binary string
MEDIUMTEXTA medium-sized non-binary string
LONGTEXTA large non-binary string
ENUMAn enumeration; each column value may be assigned one enumeration member
SETA set; each column value may be assigned zero or more set members

Understanding MySQL Table Types, or Storage Engines

In this tutorial, you will learn various MySQL table types, or storage engines. It is essential to understand the features of each table type in MySQL so that you can use them effectively to maximize the performance of your databases.
MySQL provides various storage engines for its tables as below:
  • MyISAM
  • InnoDB
  • MERGE
  • MEMORY (HEAP)
  • ARCHIVE
  • CSV
  • FEDERATED
Each storage engine has its own advantages and disadvantages. It is crucial to understand each storage engine features and choose the most appropriate one for your tables to maximize the performance of the database. In the following sections we will discuss about each storage engine and its features so that you can decide which one to use.

MyISAM

MyISAM extends the former ISAM storage engine. The MyISAM tables are optimized for compression an speed. MyISAM tables are also portable between platforms and OSes.
The size of MyISAM table can be up to 256TB, which is huge. In addition, MyISAM tables can be compressed into read-only tables to save space. At startup, MySQL checks MyISAM tables for corruption and even repair them in case of errors. The MyISAM tables are not transaction-safe.
Before MySQL version 5.5, MyISAM is the default storage engine when you create a table without explicitly specify the storage engine. From version 5.5, MySQL uses InnoDB as the default storage engine.

InnoDB

The InnoDB tables fully support ACID-compliant and transactions. They are also very optimal for performance. InnoDB table supports foreign keys, commit, rollback, roll-and forward operations. The size of the InnoDB table can be up to 64TB.
Like MyISAM, the InnoDB tables are portable between different platforms and OSes. MySQL also checks and repair InnoDB tables, if necessary, at startup.

MERGE

A MERGE table is a virtual table that combines multiple MyISAM tables, which has similar structure, into one table. The MERGE storage engine is also known as the MRG_MyISAM engine. The MERGE table does not have its own indexes; it uses indexes of the component tables instead.
Using MERGE table, you can speed up performance in joining multiple tables. MySQL only allows you to perform SELECT, DELETE, UPDATE and INSERT operations on the MERGE tables. If you use DROP TABLE statement on a MERGE table, only MERGE specification is removed. The underlying tables will not be affected.

Memory

The memory tables are stored in memory and used hash indexes so that they are faster than MyISAM tables. The lifetime of the data of the memory tables depends on the up time of the database server. The memory storage engine is formerly known as HEAP.

Archive

The archive storage engine allows you to store a large number of records, which for archiving purpose, into a compressed format to save disk space. The archive storage engine compresses a record when it is inserted and decompress it using zlib library as it is read.
The archive tables only allow INSERT and SELECT commands. The archive tables do not support indexes, so reading records requires a full table scanning.

CSV

The CSV storage engine stores data in comma-separated values file format. A CSV table brings a convenient way to migrate data into non-SQL applications such as spreadsheet software.
CSV table does not support NULL data type and read operation requires a full table scan.

FEDERATED

The FEDERATED storage engine allows you to manage data from a remote MySQL server without using cluster or replication technology. The local federated table stores no data. When you query data from a local federated table, the data is pull automatically from the remote federated tables.

Secrets of Redo Logging in InnoDB

Why InnoDB engine need Redo Log Files ?

InnoDB is a general-purpose storage engine that balances high reliability and high performance. It is a transactional storage engine and is fully ACID compliant, as would be expected from any relational database. The durability guarantee provided by InnoDB is made possible by the redo logs.


How it is generated  ?

InnoDB (.ibd files) are considered to be a sequence of equal sized pages. These pages are uniquely identified within the InnoDB system using the space_id, page_no combination. If we want to read or modify any page, it needs to be loaded into memory. So there are two copies of the pages - on disk and in memory.

1. Any changes to a page is first done to the in-memory copy of the page. The page that is modified in memory and not yet flushed to disk is marked as the dirty page in memory.
2. An associated redo log is generated in memory, in the local mini transaction (mtr) buffer. This will then be transferred to the global in-memory redo log buffer.
3. The redo log record is written from the redo log buffer in memory to the redo log file on disk. This is subsequently flushed. These two steps are considered separate - writing to the redo log file and flushing the redo log file to the disk. This is to account for the file buffering done by the operating system.
4. The dirty page is flushed from memory to disk at some later point of time as part of the checkpointing operation.The order of these steps are important. The redo log record of a change must be flushed to disk before flushing the corresponding dirty page to the disk. This is the concept of write-ahead logging (WAL).

How it works ?

By default, InnoDB creates two redo log files (or just log files) ib_logfile0 and ib_logfile1 within the data directory of MySQL. In MySQL versions 5.6.8 and above, the default size of each redo log file is 48 MB each. This can be configured by the user by making use of innodb_log_file_size server option. The number of log files is controlled byinnodb_log_files_in_group server option.
A log group consists of a number of log files, each of same size. As of MySQL 5.6, InnoDB supports only one log group. So I'll not discuss this further.
The redo log files are used in a circular fashion. This means that the redo logs are written from the beginning to end of first redo log file, then it is continued to be written into the next log file, and so on till it reaches the last redo log file. Once the last redo log file has been written, then redo logs are again written from the first redo log file.
The log files are viewed as a sequence of blocks called "log blocks" whose size is given by OS_FILE_LOG_BLOCK_SIZE which is equal to 512 bytes. Each log file has a header whose size is given by LOG_FILE_HDR_SIZE, which is defined as 4*OS_FILE_LOG_BLOCK_SIZE.

Terminology 

Redo Log File Header

Each redo log file contains a header occupying four log blocks with the following information:
The first 4 bytes contain the log group number to which the log file belongs.
The next 8 bytes contain the lsn of the start of data in this log file.
First checkpoint field located in the beginning of the second log block.
Second checkpoint field located in the beginning of the fourth log block.

Log Blocks 

A redo log file can be viewed as a sequence of log blocks. All log blocks, except the ones belonging to the log file header, contain a header and a footer.
The log block header contains the following information:
The log block number. This field is of 4 bytes.
Number of bytes of log written to this block. This field is of 2 bytes.
Offset of the first start of an mtr log record group in this log block or 0 if none
The checkpoint number to which this log block belongs
The log block trailer contains checksum of the log block contents.

Log Sequence Number (LSN)

It’s a most talked terminology while among Database Admins :) The LSN is an offset into the redo log file. Within InnoDB the log sequence number is represented by the type lsn_t, which is an 8-byte unsigned integer. The LSN is what that links the dirty page, the redo log record, and the redo log file. Each redo log record when it is copied to the in-memory log buffer, it gets an associated LSN. When each database page is modified, redo log records are generated. So each database page is also associated to an LSN. The page lsn is stored in a header for each page. The page lsn gives the lsn upto which the redo log file must be flushed before flushing the page.

Global In-memory Redo Log Buffer

The in-memory redo log buffer is global and all redo logs generated by user transactions will be written into this buffer. The size of this buffer is configurable and is given by the innodb_log_buffer_size. The default size of this redo log buffer is 8MB.

  • When it comes into picture ?
  • Running transaction is modifying database contents
  • This log buffer will be written or flushed to the log file, either when the transaction commits
  • When the log buffer gets full


Mini transaction (mtr)

A mini transaction (mtr) must be used to generate all the redo log records. A mini transaction contains a local buffer (called the mini transaction buffer) into which the generated redo log records will be stored. If we need to generate a group of redo log records such that either all make it to the redo log file or none makes it, then we need to put them in a single mini transaction. Apart from the redo log records, the mini transaction also maintains a list of pages that has been modified (dirty pages).

Redo Log Record Types

When we modify a database page, a redo log record is generated. This redo log record either contains what information has been changed in the page (physical redo log) or how to perform the change again (logical redo log). InnoDB uses a combination of physical redo logs and logical redo logs.

Life cycle of a redo log record

The life cycle of a redo log record is as follows:


  • The redo log record is first created by a mini transaction and stored in the mini transaction buffer. It has information necessary to redo the same operation again in the time of database recovery.
  • When mtr_commit() is done, the redo log record is transferred to the global in-memory redo log buffer. If necessary, the redo log buffer will be flushed to the redo log file, to make space for the new redo log record.
  • The redo log record has a specific lsn associated with it. This association is established during mtr_commit(), when the redo log record is transferred from mtr buffer to log buffer. Once the lsn is established for a redo log record, then its location in the redo log file is also established.
  • The redo log record is then transferred from the log buffer to redo log file when it is written + flushed. This means that the redo log record is now durably stored on the disk.
  • Each redo log record has an associated list of dirty pages. This relationship is established via LSN. A redo log record must be flushed to disk, before its associated dirty pages. A redo log record can be discarded only after all the associated dirty pages are flushed to the disk.
  • A redo log record can be used to re-create its associated list of dirty pages. This happens during database recovery.


It should be helpful to understand concept of redo logging which makes InnoDB a dependent transaction safe engine to keep our data in safe hands .