Hive分区表的使用 - 喵喵知识园

创建分区表

创建一个一级的分区表

            > create table dept_partition(
            > deptno int,dname string,loc string)
            > partitioned by (month string)
            > row format delimited fields terminated by '\t';
            这个一级的分区表用 一个名字为month 为string类型的作为添加数据分区的依据。

在分区表中导入数据：

hive (test_6_14)> load data local inpath '/opt/storage/dept.txt' into table de
pt_partition partition(month='1');

这样添加进去的内容在HDFS路径显示中的表文件夹内还会有一个分组的文件夹名字叫 month=1
查看分区表的内容
查询分区表的内容：用where子句来进行筛选。例如：hive (test_6_14)> select * from dept_partition where month='1';

增加分区表

hive (test_6_14)> alter table dept_partition add partition(month='2');
增加的这个分区表是空的。因为没有导入数据。

添加多个分区表应该这样做：hive (test_6_14)> alter table dept_partition add partition(month='3') partition(month='4');

删除分区表

 删除单个分区表：hive (test_6_14)> alter table dept_partition drop partition(month=4);
 删除多个分区表：alter table dept_partition drop partition(month=3),partition(month=2);
 增加多个分区表 partition的分隔符是用空格来隔开的 删除多个分区表partition的分隔符是用逗号隔开。

查看有多少个分区
show partitions dept_partition; #dept_partition表名

创建二级分区

hive (test_6_14)> create table dept_partition2(
                > deptno int,dname string,loc string)
                > partitioned by(month string,day string)
                > row format delimited fields terminated by '\t';
 partitioned by(month string,day string) 设置两个值就是二级分区了。

加载数据到二级分区：

load data local inpath"/opt/storage/dept.txt" into table dept_partition2 partition(month=6,day=14);

查看二级分区数据：也用where 第二级目录用and隔开

select * from dept_partition2 where month=6 and day=14;

如果你的数据你直接上传到HDFS的hive的目录上但是hive软件却查不到，是因为你没有产生元数据没有关联性。
这个时候你需要修复命令：hive> msck repair table dept_partition2;

如果你自己在hdfs中的hive路径上创建了分区文件夹是没有用的你需要在这个表中再添加分区这个分区的名字跟你创建的分区文件名一模一样即可，
例子：上传数据后添加分区
首先你创建了目录
dfs -mkdir -p /user/hive/warehouse/test_6_14.db/dept_partition2/month=6/day=13;
然后你上传了文件通过HDFS的put命令
dfs -put /opt/storage/dept.txt /user/hive/warehouse/test_6_14.db/dept_partition2/month=6/day=13;
这样的话虽然你的HDFS看起来是有了分区了，但是Hive上他不承认呀，这里你需要在hive创建一个分区就可以了，

这里注意： 一定要抛弃一块数据一个表的思想，这里数据虽然是分开添加，查询也不再一块其实这就是一个表的，只是做了
二级的分区。思路一定要捋清楚。

还有一种方式是你上传数据可以不用-put 你只需要用load data 命令就不需要要什么修复命令，什么添加分区命令了。

创建二级分区

评论