Geospark加载PostgreSQL数据库

阳光穿透心脏的1/2处 2023-10-04 13:19 14阅读 0赞

title: (七)GeoSpark与PostgreSQL
date: 2021-05-12 16:38:23
tags:

  • GeoSpark

PostGreSQL以及Mysql均有空间引擎扩展,且GeoSpark是针对大数据的空间分析,而SHP、GeoJson等只是小数据集的,PostGIS就能胜任其分析任务。所以对于大数据集的,我们还是要用空间关系数据库存储,利用Spark SQL从数据库中加载数据,获得DataFrame,然后利用Geospark转为几何弹性数据集RDD。

Geospark加载PostgreSQL数据库

关于PostgreSQL和PostGIS的介绍,后续会有更新,今天还是以Geospark为核心,来介绍如何用GeoSpark加载我们的数据库中的空间数据。

假设我们已经安装了PostGreSQL以及PostGIS扩展,并将我们这里的公园数据(park)导入到PostGreSQL中。

img

这里我的Postgresq的连接参数为:

  1. url=192.168.10.174:5432
  2. table=prks
  3. user=postgres
  4. password=root

首先我们需要加入postgresql的jdbc依赖

  1. <!-- https://mvnrepository.com/artifact/org.postgresql/postgresql -->
  2. <dependency>
  3. <groupId>org.postgresql</groupId>
  4. <artifactId>postgresql</artifactId>
  5. <version>42.2.5</version>
  6. </dependency>

初始化Spark,并注册GeosparkSQL

  1. SparkSession spark = SparkSession.builder().
  2. config("spark.serializer","org.apache.spark.serializer.KryoSerializer").
  3. config("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator").
  4. master("local[*]").appName("Learn07").getOrCreate();
  5. GeoSparkSQLRegistrator.registerAll(spark);
  6. GeoSparkVizRegistrator.registerAll(spark);

利用SparkSQL读取我们导入到postgresql中的数据

首先利用SparkSQL读取我们导入到postgresql中的数据,在postgis中有一个geom字段,他里面存放的是wkb格式的几何信息,对于sparksql来说,geom字段就是一个varchar字段,我们还需要使用geospark的相关函数来将其转为Geometry。

  1. String url = "jdbc:postgresql://192.168.10.174:5432/geospark";
  2. String table = "parks";
  3. Properties connectionProperties = new Properties();
  4. connectionProperties.put("user","postgres");
  5. connectionProperties.put("password","root");
  6. connectionProperties.put("driver","org.postgresql.Driver");
  7. Dataset<Row> df = spark.read().jdbc(url, table, connectionProperties);
  8. df.createOrReplaceTempView("parks");
  9. df.show();
  10. +---+------+---------+--------------------+--------------+---------------+---------------+-------+---------+----------+--------------------+
  11. |gid|parkid|refparkid| parkname| neighborho| ewstreet| nsstreet|dogpark|washrooms|specialfea| geom|
  12. +---+------+---------+--------------------+--------------+---------------+---------------+-------+---------+----------+--------------------+
  13. | 1| 1| -9999| null| Kitsilano| null| null| N| null| null|01060000000100000...|
  14. | 2| 2| 208| Rosemary Brown Park| Kitsilano| W 11th Avenue| Vine Street| N| N| N|01060000000100000...|
  15. | 3| 3| 141| Tea Swamp Park|Mount Pleasant| E 15th Avenue| Sophia Street| N| N| N|01060000000200000...|
  16. | 4| 4| -9999| null| Strathcona| null| null| N| null| null|01060000000200000...|
  17. | 5| 5| 202| Morton Park| West End| Morton Avenue| Denman Street| N| N| N|01060000000200000...|
  18. | 6| 6| -9999| Mcbride Park| Kitsilano| null| null| N| null| null|01060000000200000...|
  19. | 7| 7| -9999| Granville Park| Fairview| null| null| N| null| null|01060000000200000...|
  20. | 8| 8| -9999| null|Mount Pleasant| null| null| N| null| null|01060000000300000...|
  21. | 9| 9| 15| Creekside Park|Mount Pleasant|Terminal Avenue| Quebec Street| N| N| Y|01060000000200000...|
  22. | 10| 10| 134|China Creek South...|Mount Pleasant| E 10th Avenue| Clark Drive| N| N| N|01060000000200000...|

利用Geospark将geom列转为Geometry

从postgresql中读取的geom列为wkb格式的Geometry,因此需要利用Geospark将其转为Geometry。

  1. // 在将DataFrame转为RDD的时候,Geospark默认第一列为Geometry
  2. String sql = "select ST_GeomFromWKB(geom) as geom, parkname, parkid from parks";
  3. df = spark.sql(sql);
  4. df.show();
  5. +--------------------+--------------------+------+
  6. | geom| parkname|parkid|
  7. +--------------------+--------------------+------+
  8. |POLYGON ((-123.15...| null| 1|
  9. |POLYGON ((-123.15...| Rosemary Brown Park| 2|
  10. |MULTIPOLYGON (((-...| Tea Swamp Park| 3|
  11. |MULTIPOLYGON (((-...| null| 4|
  12. |MULTIPOLYGON (((-...| Morton Park| 5|
  13. |MULTIPOLYGON (((-...| Mcbride Park| 6|
  14. |MULTIPOLYGON (((-...| Granville Park| 7|
  15. |MULTIPOLYGON (((-...| null| 8|
  16. |MULTIPOLYGON (((-...| Creekside Park| 9|
  17. |MULTIPOLYGON (((-...|China Creek South...| 10|

Adapter静态类在DataFrame和RDD之间进行转换操作

Geospark提供了Adapter静态类,可以在DataFrame和RDD之间进行转换操作。

  1. JavaRDD<Geometry> rdd = Adapter.toJavaRdd(df);
  2. rdd.foreach((geometry -> {
  3. System.out.println(geometry);
  4. }));
  5. POLYGON ((-123.15566057081632 49.26206733490204, -123.15564728017853 49.26241791476514, -123.15548939905344 49.262415429329856, -123.15550257747702 49.26206484963618, -123.15566057081632 49.26206733490204)) null 1
  6. POLYGON ((-123.15760176703519 49.261936547646954, -123.15718706338478 49.2619299178749, -123.15719832396375 49.26162160945501, -123.15761313807661 49.26162814910161, -123.15760218456263 49.26192530535148, -123.15760176703519 49.261936547646954)) Rosemary Brown Park 2
  7. MULTIPOLYGON (((-123.09870507685639 49.25665872626679, -123.09884321830909 49.25666085651373, -123.09898135977342 49.25666298659548, -123.09918885538166 49.25666622613204, -123.09917717616912 49.25700195924662, -123.09897035961777 49.256998988908855, -123.09883221757528 49.25699703868207, -123.0986933956194 49.25699508887092, -123.09855525341915 49.256993048379265, -123.09856693541536 49.25665659585471, -123.09870507685639 49.25665872626679)), ((-123.0988872811734 49.257388923290726, -123.09874913822517 49.257387062899134, -123.09854232044736 49.257384271663746, -123.09855332287414 49.257047909676935, -123.09876081971025 49.257050970113596, -123.09889896208026 49.25705301035933, -123.0988872811734 49.257388923290726))) Tea Swamp Park 3
  8. MULTIPOLYGON (((-123.08789868681455 49.27697178369337, -123.08779279965931 49.276970245390025, -123.08768827290808 49.27696870595726, -123.0875838595297 49.276967166344015, -123.08747933279125 49.27696562672224, -123.08749381859947 49.2766309730507, -123.0875983449497 49.27663269252672, -123.08770287146653 49.27663450184158, -123.08780728446555 49.27663622121462, -123.08791317138568 49.276638029305445, -123.08789868681455 49.27697178369337)), ((-123.08777557719297 49.27735877134432, -123.08767116251035 49.277356962010295, -123.08756663446789 49.27735515266783, -123.08746210643292 49.2773533432308, -123.08747670583442 49.27701868949804, -123.08758111979606 49.27702049900829, -123.08768564713182 49.277022308338154, -123.0877915350338 49.27702420647329, -123.08777557719297 49.27735877134432))) null 4
  9. MULTIPOLYGON (((-123.14231536153594 49.288085334402886, -123.14152321088636 49.28757269767203, -123.14181135642984 49.28738168560991, -123.1422763290905 49.287682390935885, -123.14231536153594 49.288085334402886)), ((-123.14225003608222 49.28741127411166, -123.14200556426914 49.28725275241468, -123.14221142848969 49.287116250139206, -123.1422217546953 49.287118935432616, -123.14225003608222 49.28741127411166))) Morton Park 5
  10. MULTIPOLYGON (((-123.1815623274226 49.26844720942817, -123.17990316962349 49.26842093595351, -123.17992703362962 49.2677831787414, -123.18158548965377 49.26780936303689, -123.1815623274226 49.26844720942817)), ((-123.17948874889649 49.26841438577292, -123.17783027685374 49.26838907073858, -123.17785348555584 49.267750775367986, -123.17951250489656 49.267776628815625, -123.17948874889649 49.26841438577292))) Mcbride Park 6
  11. MULTIPOLYGON (((-123.14133650443698 49.25886332633381, -123.14136437634508 49.25810730827778, -123.14199539077381 49.25811759634792, -123.14262005884645 49.25812770893823, -123.14262638528903 49.25796357183548, -123.14346603622347 49.25797737135294, -123.14345971282438 49.25814159843223, -123.14343107963043 49.25889761794439, -123.14196594195184 49.25887361649547, -123.14133650443698 49.25886332633381)), ((-123.14578559773607 49.25853754368109, -123.14372265753794 49.25850460441758, -123.1437353757422 49.25816230034241, -123.14579830035647 49.25819478972661, -123.14578559773607 49.25853754368109))) Granville Park 7
  12. MULTIPOLYGON (((-123.08896518268944 49.263513027273426, -123.08895956291931 49.263854420100515, -123.08824281220937 49.263848674370315, -123.08824279938008 49.26384147968065, -123.0882483235879 49.26350728166151, -123.08896518268944 49.263513027273426)), ((-123.08945644895253 49.26424084147778, -123.08923033699166 49.26423876797776, -123.08923596790757 49.26390456985079, -123.08946207752874 49.263906193672696, -123.08945644895253 49.26424084147778)), ((-123.0889541299678 49.264236732613874, -123.08876100016299 49.264235172400404, -123.08876652075031 49.26390088444774, -123.08876650784482 49.26389368975824, -123.08895895645973 49.26389534042054, -123.0889541299678 49.264236732613874))) null 8

参考链接:https://www.jianshu.com/p/fab6d59e84ed

发表评论

表情:
评论列表 (有 0 条评论,14人围观)

还没有评论,来说两句吧...

相关阅读