PythonTip >> 博文 >> python

MySQL Schema设计(五)用Python管理字符集

zihua 2014-02-11 15:02:28 点击: 845 | 收藏


一提字符集,可能有人会说,不管天崩地裂,全用utf8,整个世界都清净了。但某些字符集是需要更多CPU、消费更多的内存和磁盘空间、甚至影响索引使用,这还不包括令人蛋碎的乱码。可见、我们还是有必要花点时间略懂下MySQL字符集。

先直观认识各阶梯下显示使用字符集:

# 囊括三个层级:DB、Table、Column

mysql> create database d charset utf8;
Query OK, 1 row affected (0.04 sec)

mysql> create table d.t
    -> (str varchar(10) charset latin1)
    -> default charset=utf8;
Query OK, 0 rows affected (0.05 sec)

那如果没有显示指定?MySQL是如何设置?兵分两路:
     
  ① 创建对象时的默认设置
        
     这是个逐层继承的默认设置:
     Server → DB → Table → Column
     高层为底层设置默认值、底层可遵可弃
     没有指定字符集、谓之可遵
     显示指定字符集、谓之可弃
        
  ② 服务器和客户端通信时的设置
        
     当客户端提交一条SQL到MySQL时、MySQL Server总是假定客户端字符集是character_set_client
     其后、Server把character_set_client转为character_set_connection进行SQL处理、
     在返回结果集给客户端时、Server又将character_set_connection转为character_set_result、然后返回
        
     以上涉及的三个字符集、我们可以通过set names 一次搞定
        
     
字符集之间的相互转换是需要额外的系统开销的、如何知道?explain extended + show warnings 即可。那该如何尽量避免这种隐式转换?这里介绍一种被称为"极简原则"的方法、如下:
先为服务器(或数据库)选择合适的字符集、然后依据业务、让某些列选择合适的字符集
在MySQL字符集中隐含了些意外惊喜、主要有三:
① 有趣的character_set_database
  
  当character_set_database和character_set_server不同时、库的默认字符集由后者决定
  你不能直接修改csd、改变css就改变了csd、因为csd和库默认字符集相同、
  改变库默认字符集、csd就随之改变、而css决定库的默认字符集
  所以、当连接到mysql实例、又没有指定库时、默认字符集与css相同
  
② load data infile 
  
  进行此操作时、建议最佳实践如下:
  use 库;
  set names 字符集;
  开始加载数据;
  这就使用统一字符集、避免混搭的"字符集style"
  
③ select into outfile 
  
  该行为没有进行任何转码操作!

㈠ 显示字符集

表结构:
mysql> desc sakila.actor;
+-------------+----------------------+------+-----+-------------------+-----------------------------+
| Field       | Type                 | Null | Key | Default           | Extra                       |
+-------------+----------------------+------+-----+-------------------+-----------------------------+
| actor_id    | smallint(5) unsigned | NO   | PRI | NULL              | auto_increment              |
| first_name  | varchar(45)          | NO   |     | NULL              |                             |
| last_name   | varchar(45)          | NO   | MUL | NULL              |                             |
| last_update | timestamp            | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------------+----------------------+------+-----+-------------------+-----------------------------+
4 rows in set (0.00 sec)

数据库连接模块:

[root@DataHacker ~]# cat dbapi.py
#!/usr/bin/env ipython
#coding = utf-8
#Author: linwaterbin@gmail.com
#Time: 2014-1-29

import MySQLdb as dbapi

USER = 'root'
PASSWD = 'oracle'
HOST = '127.0.0.1'
DB = 'sakila'

conn = dbapi.connect(user=USER,passwd=PASSWD,host=HOST,db=DB)
查看版本:
[root@DataHacker ~]# ./show_charset.py --version
1.0
查看帮助:
[root@DataHacker ~]# ./show_charset.py -h
Usage: show_charset.py [options] <arg1> <arg2> [<arg3>...]

Options:
  --version   show program's version number and exit
  -h, --help  show this help message and exit
  -d DB_NAME  Database name(leave blank is all Databases)
  -t T_NAME   Table name (leave blank is all tabless)
  -c C_NAME   Column name(leave blank is all columns)
我们要的效果:
[root@DataHacker ~]# ./show_charset.py -d sakila -t actor
sakila.actor.first_name:     utf8     utf8_general_ci
sakila.actor.last_name:     utf8     utf8_general_ci

细心的朋友或许已经看出actor_id与last_update这两列并没有被统计,由此我们也可以确定,只有基于字符的值才有字符集的概念

在5.6更是直截了当:

mysql> create table tt (str char(2) charset utf8);
Query OK, 0 rows affected (0.20 sec)

mysql> create table tt (str int(11) charset utf8);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'charset utf8)' at line 1
代码:
[root@DataHacker ~]# cat show_charset.py
#!/usr/bin/env python

from optparse import OptionParser
from dbapi import conn
import MySQLdb
# 函数一:命令行参数输入
def parse_options():
  parser = OptionParser(usage="%prog [options] <arg1> <arg2> [<arg3>...]",version='1.0',)
  parser.add_option("-d",dest="db_name",help="Database name(leave blank is all Databases)")
  parser.add_option("-t",dest="t_name",help="Table name (leave blank is all tabless)")
  parser.add_option("-c",dest="c_name",help="Column name(leave blank is all columns)")
  return parser.parse_args()

# 主功能实现:显示字符集
def show_charsets():
  query="""select * from information_schema.columns
            where table_schema not in ('mysql','INFORMATION_SCHEMA')
              and character_set_name is not null"""
  #三个if条件实现过滤
  if options.db_name:
    query += " and table_schema='%s'" % options.db_name
  if options.t_name:
    query += " and table_name='%s'" % options.t_name
  if options.c_name:
    query += " and column_name='%s'" % options.c_name
  #默认返回值形式是元组,我们通过属性cursors.DictCursor转为字典
  cur = conn.cursor(MySQLdb.cursors.DictCursor)
  cur.execute(query)
  for record in cur.fetchall():
    character_set_name = record['CHARACTER_SET_NAME']
    collation_name = record['COLLATION_NAME']
    print "%s.%s.%s:\t%s\t%s" % (record['TABLE_SCHEMA'],record['TABLE_NAME'],record['COLUMN_NAME'],character_set_name,collation_name)
  cur.close()

#采用try-finally形式关闭数据库连接  
try:
    options,args = parse_options()
    show_charsets()
finally:
    conn.close()

㈡ 修改列的字符集

查看帮助:
[root@DataHacker ~]# ./modify.py -h
Usage: 
modify.py schema_name.table_name.column_name new_charset_name [new_collate_name]

Options:
  --version   show program's version number and exit
  -h, --help  show this help message and exit
我们要的效果:
#修改前
mysql> show create table testdb.t\G;
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(10) CHARACTER SET latin1 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

#修改
[root@DataHacker ~]# ./modify.py testdb.t.name gbk
successfully executed:
     alter table testdb.t modify column name varchar(10) CHARSET gbk

#修改后
mysql> show create table testdb.t\G;
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(10) CHARACTER SET gbk DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.01 sec)
实现代码如下:
[root@DataHacker ~]# cat modify.py 
#!/usr/bin/env python

import MySQLdb
from dbapi import *
from optparse import OptionParser

#这里省略掉option值,只要求输入args
def parse_options():
  parser = OptionParser(usage="\n%prog schema_name.table_name.column_name new_charset_name [new_collate_name]",version='1.0',)
  return parser.parse_args()

#主程序
def modify_column():
  cur = conn.cursor(MySQLdb.cursors.DictCursor)
  v_sql = """
          select * from information_schema.columns 
           where table_schema='%s' 
             and table_name='%s'
             and column_name='%s'""" % (schema_name,table_name,column_name)
  cur.execute(v_sql)
  row = cur.fetchone()
  #当row为null时,程序请求检查column是否存在
  if not row:
    print "please check schema_name.table_name.column_name whether exists ?"
    exit(1)
  column_type = row['COLUMN_TYPE']
  column_default = row['COLUMN_DEFAULT']
  is_nullable = (row['IS_NULLABLE'] == 'YES')
  query = "alter table %s.%s modify column %s %s" % (schema_name,table_name,column_name,column_type)
  query += " CHARSET %s" % new_charset
  if collation_supplied:
    query += "COLLATE %s" % new_collation
  if not is_nullable:
    query += "NOT NULL"
  if column_default:
    query += "DEFAULT '%s'" % column_default

  try:
    alter_cur = conn.cursor()
    alter_cur.execute(query)
    print "successfully executed:\n \t%s" % query
  finally:
    alter_cur.close()

  cur.close()

try:
  (options,args) = parse_options()
  if not 2<= len(args) <=3:
    print "Usage: schema_name.table_name.column_name new_charset_name [new_collate_name]"
    exit(1)
  column_tokens = args[0].split(".")
  if len(column_tokens) != 3:
    print "column must in the following format: schema_name.table_name.column_name"
    exit(1)
  
  schema_name,table_name,column_name = column_tokens
  new_charset = args[1]
  collation_supplied = (len(args) == 3)
  if collation_supplied:
    new_collation = args[2]

  modify_column()
finally:
  if conn:
    conn.close()
By DataHacker
2014-2-11
Good Luck!
原文链接:http://www.tuicool.com/articles/zQBF3u

作者:zihua | 分类: python | 标签: python | 阅读: 845 | 发布于: 2014-02-11 15时 |