Java操作Spark学习

==作者:YB-Chi==

[toc]

配置文件

core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yarn/tmp</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/yarn/journal</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>EPRI-DCLOUD-HDP01</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>yarn</value>
</property>
</configuration>

hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
从hadoop/ect/hadoop里扒过来
不需要最后一行配置

<property>
<name>dfs.hosts.exclude</name>
<value>/home/yarn/hadoop-2.6.0/etc/hadoop/excludes</value>
</property>

所需配置如下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn0,nn1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn0</name>
<value>EPRI-DCLOUD-HDP01:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn0</name>
<value>EPRI-DCLOUD-HDP01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>EPRI-DCLOUD-HDP02:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>EPRI-DCLOUD-HDP02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://EPRI-DCLOUD-HDP01:8485;EPRI-DCLOUD-HDP02:8485;EPRI-DCLOUD-HDP03:8485/ns1</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>EPRI-DCLOUD-HDP01,EPRI-DCLOUD-HDP02,EPRI-DCLOUD-HDP03</value>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>60000</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/yarn/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/yarn/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/yarn/name</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

<property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>600000</value>
</property>

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

hive-site.xml

从hive的conf扒过来

所需jar文件

jar包从linux上扒下来

打jar方式

代码方面

demo1:

1
2
3
4
5
6
7
8
9
10
11
   SparkConf conf = new SparkConf().setAppName("hivePartion");
SparkContext sc = new SparkContext(conf);
HiveContext hiveCtx = new HiveContext(sc);
String sql = "select * from fjudm4.hbase_md_load_bus_hisdata limit 1";
Row[] rows = hiveCtx.sql(sql).collect();

for (int i = 0; i < rows.length; i++) {
Row row = rows[i];
System.out.println("i为" + i + " i对应的row为" + row.toString());
}
sc.stop();
#Row[] rows里装的是所有的查出来的数据  所有条   注意 它toString无法显示他的所有数据  只显示地址
#row是一条数据  不是数组也不是list  格式:

[20161103,115967690991992834,null,福建.半兰山/220kV.1负荷,null,ss,r,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2016-11-04,2016-11-03]
#row.get()类似于sql里的rs.getString();

开发中遇到的问题

phoenix-4.6.0-HBase-1.1-client.jar和spark-assembly-1.5.2-hadoop2.6.0.jar会冲突thrift包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
    Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)
at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:174)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:177)
at com.sgcc.epri.dcloud.dm_hive_datacheck.query.impl.HiveQueryImpl.queryCount(HiveQueryImpl.java:59)
at com.sgcc.epri.dcloud.dm_hive_datacheck.common.QueryAndCompare.doCheck(QueryAndCompare.java:48)
at com.sgcc.epri.dcloud.dm_hive_datacheck.common.Common.check(Common.java:35)
at com.sgcc.epri.dcloud.dm_hive_datacheck.main.MainCompare.main(MainCompare.java:18)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 25 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 30 more
Caused by: java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class;
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$set_ugi_args.write(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:63)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_set_ugi(ThriftHiveMetastore.java:3260)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:3251)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:352)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214)
... 35 more
````
删除了phoenix包里的scheme包解决了



hiveSQL查询不出数据
````sql
select * from fjudm4.HBASE_MD_MEASANALOG_BREAKER where timestamp >= to_date('2016-11-01 00:00:00.0') and timestamp < to_date('2016-11-01 00:05:00.0') limit 3;

sql错了,hive不可以用timestamp直接比较 应该用时间分区列date=’2016-11-01’

文章作者: CYBSKY
文章链接: https://cybsky.top/2022/09/07/cyb-mds/bigdata/Spark/Java操作Spark学习/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 CYBSKY