数据中心 频道

hbase的搭建

  【IT168 资讯】注意:最新的HBase Shell已经取消了HQL支持,也就是说下面的资料中的插入和查询SQL语句都不可以用了。为了效率考虑,现在只提供get,put,scan等几个方法来处理数据。

URL:http://hadoop.apache.org/hbase/docs/r0.1.1/api/overview-summary.html
在已经创建的hdfs基础上搭建
1:修改hadoop/contrib/hbase/conf/hbase-env.sh
加入java_home的路径

2:修改hadoop/contrib/hbase/conf/hbase-site.xml,加入如下

hbase.master
10.0.4.121:11100
The host and port that the HBase master runs at. hbase.rootdir
hdfs://10.0.4.121:10100/hbase
The directory shared by region servers. 3:启动hbase
hadoop/contrib/hbase/bin/start-hbase.sh

4: 查看http://wiki.apache.org/hadoop/Hbase/HbaseShell,进行shell操作
4.1 首先进入shell, hadoop/contrib/hbase/bin/hbase shell
4.2 创建表
create table offer(image_big,image_small);
4.2 插入数据,查询,删除数据
如:
insert into offer(image_big:,image_small:) values (’abcdefg’,'abc’) where row = ‘testinsert’;
insert into offer(image_big:,image_small:) values (’hijklmn’,'hij’) where row = ‘testinsert’;
insert into offer(image_big:content,image_big:path,image_small:content,image_small:path) values (’abcdefg’,'path_big’,'abc’,'path_small’) where row = ‘testinsert’;
insert into offer(image_big:content,image_big:path,image_small:content,image_small:path) values (’hijklmn’,'path_big’,'hij’,'path_small’) where row = ‘testinsert’;

select * from offer where row = ‘testinsert’;
+————————+————————-+
| Column | Cell |
+————————+————————-+
| image_big: | hijklmn |
+————————+————————-+
| image_big:content | hijklmn |
+————————+————————-+
| image_big:path | path_big |
+————————+————————-+
| image_small: | hij |
+————————+————————-+
| image_small:content | hij |
+————————+————————-+
| image_small:path | path_small |
+————————+————————-+

select count(*) from offer where row = ‘testinsert’;
1 row(s) in set. (0.02 sec)
从上可以看到,虽然我们插入了4条数据,但是结果是1,hbase覆盖了相同的数据,insert2覆盖insert1,insert4覆盖insert2,相当于update,从shell的介绍中我们也看到hql没有提供update
此时的数据结果应该如下:
+———-+————————–+—————————+
| | Column image_big | Column image_small |
| key +————————–+—————————+
| | : |:content | :path | : |:content| :path |
+————————————-+—————————+
|testinsert|hijklmn|hijklmn |path_big| hij | hij | path_small|
+———-+————————–+—————————+
加入insert加入TIMESTAMP会怎么样呢?
delete * from offer where row = ‘testinsert’;

insert into offer(image_big:,image_small:) values (’abcdefg’,'abc’) where row = ‘testinsert’ timestamp ‘1209982310285′;
insert into offer(image_big:,image_small:) values (’hijklmn’,'hij’) where row = ‘testinsert’ timestamp ‘1209982311285′;
insert into offer(image_big:content,image_big:path,image_small:content,image_small:path) values (’abcdefg’,'path_big’,'abc’,'path_small’) where row = ‘testinsert’ timestamp ‘1209982312285′;
insert into offer(image_big:content,image_big:path,image_small:content,image_small:path) values (’hijklmn’,'path_big’,'hij’,'path_small’) where row = ‘testinsert’ timestamp ‘1209982313285′;

结果无论是
select * from offer where row = ‘testinsert’
or select * from offer where row = ‘testinsert’ timestamp ‘1209982310285′;
都只返回
+————————-+———————-+
| Column | Cell |
+————————-+———————-+
| image_big: | hijklmn |
+————————-+———————-+
| image_big:content | hijklmn |
+————————-+———————-+
| image_big:path | path_big |
+————————-+———————-+
| image_small: | hij |
+————————-+———————-+
| image_small:content | hij |
+————————-+———————-+
| image_small:path | path_small |
+————————-+———————-+

我迷惑了,如hbase Architecture介绍中是有timestamp的,数据按照时间备份的.但这里怎么理解哦…
http://www.mail-archive.com/core-user@hadoop.apache.org/msg00222.html,上面的页面中说到似乎目前还不支持,但是我这里插入是成功的;另外个人理解row和timestamp从数据结果上来说都是index级的,应该是数据本身之外的,那么不显示倒是没啥问题,但是数据好像被覆盖呢?难道目前不支持……
先delete
delete * from offer where row = ‘testinsert’;
再select
select * from offer where row = ‘testinsert’;
+————————-+———————-+
| Column | Cell |
+————————-+———————-+
| image_big: | abcdefg |
+————————-+———————-+
| image_big:content | abcdefg |
+————————-+———————-+
| image_big:path | path_big |
+————————-+———————-+
| image_small: | abc |
+————————-+———————-+
| image_small:content | abc |
+————————-+———————-+
| image_small:path | path_small |
+————————-+———————-+
这个意外的发现,说明数据是有备份的,是不过没有搜索到历史数据,select中的timestamp条件好像没有起作用,每次返回都是最新的数据.架构中说道insert如果没有时间条件,系统默认会加上当前时间.

5 client访问hbase
如上次访问HDFS,引入hbase-site.xml,lib包,代码如下

package com.chua.hadoop.client;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Iterator;
import java.util.SortedMap;

import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HTable;
import org.apache.hadoop.io.Text;

/**
* 类HBase.java的实现描述:TODO 类实现描述
* @author chua 2008-5-4 下午05:03:33
*/
public class HBase {

/**
* @param args
*/
public static void main(String[] args) throws Exception {
String domain = “www.dlog.cn”;
String path_s = “/uploads/m/me/meichua/meichua_100.jpg”;
String path_b = “/uploads/m/me/meichua/200804/22094433_tLuyw.jpg”;
byte[] data_s = getData(domain, path_s);
byte[] data_b = getData(domain,path_b);

HBaseConfiguration config = new HBaseConfiguration();
HTable table = new HTable(config, new Text(”offer”));
createRecore(table,”chua”,”image_big”,data_b,path_b);
createRecore(table,”chua”,”image_small”,data_s,path_s);

//取得一个row的所有data,遍历keySet
SortedMap map = table.getRow(new Text(”chua”));
if(!map.isEmpty()) {
Iterator it = map.keySet().iterator();
while(it.hasNext()){
System.out.println(it.next());
}
}
//取得某个row的colunmName的data
byte[] data = table.get(new Text(”chua”), new Text(”image_big:content”));
saveAsFile(data,”c:/chua_big.jpg”);
}

public static void createRecore(HTable table,String row, String colunm,byte[] data, String path) throws IOException {
long lockId = table.startUpdate(new Text(row));
table.put(lockId, new Text(colunm+”:content”), data);
table.put(lockId, new Text(colunm+”:path”), path.getBytes());
table.commit(lockId);
}

/**
* 从网上读取图片
* @param domain
* @param path
* @return
*/
public static byte[] getData(String domain,String path){
byte[] dataResource = null;
try {
HttpClient client = new HttpClient();
client.getHostConfiguration().setHost(domain,80,”http”);
GetMethod getMethod = new GetMethod(path);
int status = client.executeMethod(getMethod);
if(status == 200) {
dataResource = getMethod.getResponseBody();
}
getMethod.releaseConnection();
} catch(Exception e) {
System.out.println(”Download error”+e);
}
return dataResource;
}

/**
* 从本地文件读取
* @param path
* @return
*/
public static byte[] getData(String path) {
File file = new File(path);
DataInputStream dis = null;
try {
dis = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int length = dis.available();
byte[] data = new byte[length];
dis.read(data);
return data;
} catch (Exception e) {
e.printStackTrace();
return null;
}
}

/**
* 存到一个文件
* @param data
* @param path
*/
public static void saveAsFile(byte[] data,String path) {
if(data != null) {
try {
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(path));
for(byte tmp : data) {
out.write(tmp);
}
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
输出:
image_big:content
image_big:path
image_small:content
image_small:path
以上是一个client访问hbase的例子,比较简单

0
相关文章