美文网首页
Spark读取远程hdfs文件

Spark读取远程hdfs文件

作者: ssttIsme | 来源:发表于2025-12-27 23:25 被阅读0次

现在远程虚机创建1.txt输入如下内容

Hello Spark
Hello Java
[server@hadoop102 ~]$ cat 1.txt
Hello Spark
Hello Java
[server@hadoop102 ~]$ hadoop fs -put 1.txt /tmp 
2025-12-28 10:17:55,964 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[server@hadoop102 ~]$ hadoop fs -ls /tmp/
Found 5 items
-rw-r--r--   3 server supergroup         23 2025-12-28 10:17 /tmp/1.txt
-rw-r--r--   3 server supergroup      43579 2025-12-28 10:16 /tmp/a.txt
drwxrwxrwt   - server supergroup          0 2021-08-03 17:19 /tmp/hadoop-yarn
drwxrwxrwt   - server supergroup          0 2025-12-20 22:44 /tmp/hive
drwxrwxrwt   - server server              0 2021-08-03 17:22 /tmp/logs
package com.spoon.bigdata.core.rdd.builder

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD


object RDD_File {
  def main(args: Array[String]): Unit = {
    //准备环境
    val sparkConf=new SparkConf().setMaster("local[*]").setAppName("RDD")
    val sc=new SparkContext(sparkConf)

    //val rdd: RDD[String] =sc.textFile("datas/1.txt")
    val rdd: RDD[String] =sc.textFile("hdfs://hadoop102:8020/tmp/1.txt")
    rdd.collect().foreach(println)
    //关闭环境
    sc.stop()
  }
}

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>spark-demo</artifactId>
        <groupId>com.spoon.bigdata</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>

    <modelVersion>4.0.0</modelVersion>

    <artifactId>spark-core</artifactId>
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>3.0.0</version>
        </dependency>
    </dependencies>
    <repositories>
        <repository>
            <id>pentaho-releases</id>
            <url>https://nexus.pentaho.org/content/groups/omni/</url>
        </repository>
        <repository>
            <id>pentaho</id>
            <url>https://repo.pentaho.org/content/groups/omni/</url>
        </repository>
    </repositories>
</project>

相关文章

网友评论

      本文标题:Spark读取远程hdfs文件

      本文链接:https://www.haomeiwen.com/subject/wofooltx.html