What is Transaction
先摘录一段Wiki关于Transaction(事务)的释义:
A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in
a database environment have two main purposes:
1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database
remain uncompleted, with unclear status.
2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome are possibly erroneous.
A database transaction, by definition, must be atomic, consistent, isolated and durable. Database practitioners often refer to these properties of database transactions using the acronym ACID.
Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction
from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.
其中,all-or-nothing很明确的表达了事务的本质:“要么全成功,要么什么都没发生过”。可以把它看作一个原子操作。
HBase Transaction
HBase的支持的事务很有限,0.94版本的新特性中有一条:
[HBASE-5229] - Provide basic building blocks for "multi-row" local transactions.
引用NoSQLFan博客中的解读:
“0.94版本具备更完整的事务支持: 之前Hbase提供行级的事务,不过每次事务只能执行一个写操作,比如连续地执行一系列Put,Delete操作,那么这些操作是单独一个个的事务,其整体并不是原子性执行的。而在0.94版本中,可以实现Put、Delete在同一个事务中一起原子性执行。见提案HBASE-3584。”
具体怎么用呢?有下面这一段Sample:
//Add API for atomic row mutations to HBase (currently Put and Delete). Client API would look like this: Delete d = new Delete(ROW); Put p = new Put(ROW); //... AtomicRowMutation arm = new AtomicRowMutation(ROW); arm.add(p); arm.add(d); myHtable.atomicMutation(arm);
可以看到,这里的事务仅仅是针对某一行的一系列Put/Delete操作。不同行、不同表间一坨操作是无法放在一个事务中的。
What We Need
结合我之前的博文《HBase多条件查询》,我们通常需要为了查询而建立多个索引表。
比如我Save一条player数据,主表(信息完整表)的rowKey是以ID构成,顺序排列。
我如果需要按player的积分(Scores)排个TOP 10,我可能还需要用(Max - player.getScores) + ID建一张索引表。同理可能还有多个……
每当插入一条player记录,我需要同时对这两个rowKey进行put操作。
这,就已经超出了HBase支持的事务范畴(同一行的一系列操作事务)。
此处无事务会怎样?比如我新增一个player,首先执行IDTbl.put,再执行ScoresTbl.put。
在执行第二步ScoresTbl.put时与HBase Cluster的网络中断了,此时ScoresTbl.put超时失败,但IDTbl.put已经成功完成了。
如果仅仅在业务层做了RollBack,此时应该会执行IDTbl.delete,但是网络中断,delete操作一样会失败。
网络恢复了!
最终,我们新增的这位Player登录进来(因为IDTbl.put中有他的记录,所以可以成功登录)。
在点击“查看我的排名”按钮时会触发ScoresTbl.get操作,但ScoresTbl中没有他的记录,对后续操作会有不可预知的影响。
如果,我们有10几张索引表……
因此,事务是必须的!
What Can We Do
为了解决这种不同记录、不同表间的事务问题,我看到两个项目。