Apache Spark Supported File Systems :Apache Spark is one of the trending technology in today IT industry.Apache Spark supports many file systems to process data.File Systems supported by Apache spark described below.Here is complete details about Apache Spark Supported File Systems .
Apache Spark Supported File Systems:
Spark supports a large number of file systems for reading and writing to, which we can use with any of the file formats we want.
1. Local/“Regular” FS
While Spark supports loading files from the local file system, it requires that the files are available at the same path on all nodes in your cluster.
Spark Example in Scala: Loading a text file from the local file system
val conf = new SparkConf().setMaster("local").setAppName("LocalFS"); val context = new SparkContext(conf); val inputRdd = context.textFile("file:///home/hadoop/input.txt") inputRdd.saveAsTextFile("/home/output/");
2. Amazon S3
Amazon S3 is an increasingly popular option for storing large amounts of data .S3 is especially fast when your compute nodes are located inside of Amazon EC2, but can easily have much worse performance if you have to go over the public Internet.
To access S3 in Spark, you should first set the AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY environment variables to your S3 credentials. You can create these credentials from the Amazon Web Services console. Then pass a path starting with s3n:// to Spark’s file input methods.
The Hadoop Distributed File System (HDFS) is a popular distributed filesystem ,which can work with spark.HDFS is designed to work on commodity hardware and be resilient to node failure while providing high data throughput. Spark and HDFS can be collocated on the same machines.
Using Spark with HDFS is as simple as specifying hdfs://master:port/path for your input and output.
SparkConf conf = new SparkConf().setMaster("local").setAppName("HDFSFILESYSTEM"); JavaSparkContext context = new JavaSparkContext(conf); JavaRDD<String> text = context.textFile("hdfs://localhost:8020/home/derby.log"); text.saveAsTextFile("hdfs://localhost:8020/home/derby/");
These are the mainly Apache spark supported file systems.Apache Spark can get the data from other data sources like relational databases,Cassandra and MongoDb.
Share this knowledge ! Join us on Facebook ! Subscribe For our Website and Youtube channel ! BookMark our HadoopTpoint.com ! Any Doubts Comment below .