orwa-te · June 19, 2020 10:33
diff --git a/output logs b/output logs

 20/06/19 13:30:06 WARN Utils: Your hostname, orwa-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.198.131 instead (on interface ens33)
 20/06/19 13:30:06 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
 20/06/19 13:30:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 2020-06-19 13:30:10.735978: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
 2020-06-19 13:30:10.736517: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
 2020-06-19 13:30:10.736634: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 20/06/19 13:30:12 INFO SparkContext: Running Spark version 2.4.5
 20/06/19 13:30:12 INFO SparkContext: Submitted application: keras_spark_mnist
 20/06/19 13:30:12 INFO SecurityManager: Changing view acls to: orwa
 20/06/19 13:30:12 INFO SecurityManager: Changing modify acls to: orwa
 20/06/19 13:30:12 INFO SecurityManager: Changing view acls groups to: 
 20/06/19 13:30:12 INFO SecurityManager: Changing modify acls groups to: 
 20/06/19 13:30:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(orwa); groups with view permissions: Set(); users  with modify permissions: Set(orwa); groups with modify permissions: Set()
 20/06/19 13:30:13 INFO Utils: Successfully started service 'sparkDriver' on port 42371.
 20/06/19 13:30:13 INFO SparkEnv: Registering MapOutputTracker
 20/06/19 13:30:13 INFO SparkEnv: Registering BlockManagerMaster
 20/06/19 13:30:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
 20/06/19 13:30:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
 20/06/19 13:30:13 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-94746bda-4fda-4f51-af0d-70b72156aed0
 20/06/19 13:30:13 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
 20/06/19 13:30:13 INFO SparkEnv: Registering OutputCommitCoordinator
 20/06/19 13:30:13 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
 20/06/19 13:30:13 INFO Utils: Successfully started service 'SparkUI' on port 4041.
 20/06/19 13:30:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.198.131:4041
 20/06/19 13:30:13 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://192.168.198.131:7077...
 20/06/19 13:30:14 INFO TransportClientFactory: Successfully created connection to /192.168.198.131:7077 after 93 ms (0 ms spent in bootstraps)
 20/06/19 13:30:14 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20200619133014-0001
 20/06/19 13:30:14 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200619133014-0001/0 on worker-20200619132437-192.168.198.131-41425 (192.168.198.131:41425) with 4 core(s)
 20/06/19 13:30:14 INFO StandaloneSchedulerBackend: Granted executor ID app-20200619133014-0001/0 on hostPort 192.168.198.131:41425 with 4 core(s), 1024.0 MB RAM
 20/06/19 13:30:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41585.
 20/06/19 13:30:14 INFO NettyBlockTransferService: Server created on 192.168.198.131:41585
 20/06/19 13:30:14 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
 20/06/19 13:30:14 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200619133014-0001/0 is now RUNNING
 20/06/19 13:30:14 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.198.131, 41585, None)
 20/06/19 13:30:14 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.198.131:41585 with 366.3 MB RAM, BlockManagerId(driver, 192.168.198.131, 41585, None)
 20/06/19 13:30:14 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.198.131, 41585, None)
 20/06/19 13:30:14 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.198.131, 41585, None)
 20/06/19 13:30:14 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
 20/06/19 13:30:15 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/orwa/spark_files/spark-warehouse').
 20/06/19 13:30:15 INFO SharedState: Warehouse path is 'file:/home/orwa/spark_files/spark-warehouse'.
 20/06/19 13:30:16 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
 20/06/19 13:30:17 INFO InMemoryFileIndex: It took 174 ms to list leaf files for 1 paths.
 20/06/19 13:30:18 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.198.131:59444) with ID 0
 20/06/19 13:30:18 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.198.131:33457 with 366.3 MB RAM, BlockManagerId(0, 192.168.198.131, 33457, None)
 20/06/19 13:30:20 INFO FileSourceStrategy: Pruning directories with: 
 20/06/19 13:30:20 INFO FileSourceStrategy: Post-Scan Filters: 
 20/06/19 13:30:20 INFO FileSourceStrategy: Output Data Schema: struct<label: double>
 20/06/19 13:30:20 INFO FileSourceScanExec: Pushed Filters: 
 20/06/19 13:30:21 INFO CodeGenerator: Code generated in 373.585768 ms
 20/06/19 13:30:21 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 283.2 KB, free 366.0 MB)
 20/06/19 13:30:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.4 KB, free 366.0 MB)
 20/06/19 13:30:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.198.131:41585 (size: 23.4 KB, free: 366.3 MB)
 20/06/19 13:30:22 INFO SparkContext: Created broadcast 0 from broadcast at LibSVMRelation.scala:153
 20/06/19 13:30:22 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4843402 bytes, open cost is considered as scanning 4194304 bytes.
 20/06/19 13:30:22 INFO SparkContext: Starting job: treeAggregate at OneHotEncoderEstimator.scala:487
 20/06/19 13:30:22 INFO DAGScheduler: Got job 0 (treeAggregate at OneHotEncoderEstimator.scala:487) with 4 output partitions
 20/06/19 13:30:22 INFO DAGScheduler: Final stage: ResultStage 0 (treeAggregate at OneHotEncoderEstimator.scala:487)
 20/06/19 13:30:22 INFO DAGScheduler: Parents of final stage: List()
 20/06/19 13:30:22 INFO DAGScheduler: Missing parents: List()
 20/06/19 13:30:22 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[6] at treeAggregate at OneHotEncoderEstimator.scala:487), which has no missing parents
 20/06/19 13:30:22 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 12.5 KB, free 366.0 MB)
 20/06/19 13:30:22 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.5 KB, free 366.0 MB)
 20/06/19 13:30:22 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.198.131:41585 (size: 6.5 KB, free: 366.3 MB)
 20/06/19 13:30:22 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1163
 20/06/19 13:30:22 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (MapPartitionsRDD[6] at treeAggregate at OneHotEncoderEstimator.scala:487) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
 20/06/19 13:30:22 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
 20/06/19 13:30:22 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.198.131, executor 0, partition 0, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:22 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.198.131, executor 0, partition 1, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:22 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 192.168.198.131, executor 0, partition 2, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:22 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 192.168.198.131, executor 0, partition 3, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:23 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.198.131:33457 (size: 6.5 KB, free: 366.3 MB)
 20/06/19 13:30:24 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.198.131:33457 (size: 23.4 KB, free: 366.3 MB)
 20/06/19 13:30:28 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 5798 ms on 192.168.198.131 (executor 0) (1/4)
 20/06/19 13:30:32 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 10031 ms on 192.168.198.131 (executor 0) (2/4)
 20/06/19 13:30:33 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 10382 ms on 192.168.198.131 (executor 0) (3/4)
 20/06/19 13:30:33 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 10458 ms on 192.168.198.131 (executor 0) (4/4)
 20/06/19 13:30:33 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
 20/06/19 13:30:33 INFO DAGScheduler: ResultStage 0 (treeAggregate at OneHotEncoderEstimator.scala:487) finished in 10.576 s
 20/06/19 13:30:33 INFO DAGScheduler: Job 0 finished: treeAggregate at OneHotEncoderEstimator.scala:487, took 10.696725 s
 2020-06-19 13:30:33.909324: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
 2020-06-19 13:30:33.909390: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
 2020-06-19 13:30:33.909426: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (orwa-virtual-machine): /proc/driver/nvidia/version does not exist
 2020-06-19 13:30:33.939209: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
 2020-06-19 13:30:33.952595: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
 2020-06-19 13:30:33.953395: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5629f5a4cdb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
 2020-06-19 13:30:33.953477: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
 num_partitions=40
 writing dataframes
 train_data_path=file:///home/orwa/tmp/intermediate_train_data.0
 val_data_path=file:///home/orwa/tmp/intermediate_val_data.0
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 12
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 22
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 19
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 8
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 7
 20/06/19 13:30:35 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.198.131:41585 in memory (size: 6.5 KB, free: 366.3 MB)
 20/06/19 13:30:35 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.198.131:33457 in memory (size: 6.5 KB, free: 366.3 MB)
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 24
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 28
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 13
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 11
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 15
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 14
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 25
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 17
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 29
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 26
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 30
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 18
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 9
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 21
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 20
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 16
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 6
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 27
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 10
 20/06/19 13:30:35 INFO ContextCleaner: Cleaned accumulator 23
 20/06/19 13:30:35 INFO FileSourceStrategy: Pruning directories with: 
 20/06/19 13:30:35 INFO FileSourceStrategy: Post-Scan Filters: 
 20/06/19 13:30:35 INFO FileSourceStrategy: Output Data Schema: struct<label: double, features: vector>
 20/06/19 13:30:35 INFO FileSourceScanExec: Pushed Filters: 
 20/06/19 13:30:35 INFO CodeGenerator: Code generated in 182.410044 ms
 20/06/19 13:30:35 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 283.2 KB, free 365.7 MB)
 20/06/19 13:30:35 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 23.4 KB, free 365.7 MB)
 20/06/19 13:30:35 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.198.131:41585 (size: 23.4 KB, free: 366.3 MB)
 20/06/19 13:30:35 INFO SparkContext: Created broadcast 2 from broadcast at LibSVMRelation.scala:153
 20/06/19 13:30:35 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4843402 bytes, open cost is considered as scanning 4194304 bytes.
 20/06/19 13:30:36 INFO SparkContext: Starting job: runJob at PythonRDD.scala:153
 20/06/19 13:30:36 INFO DAGScheduler: Got job 1 (runJob at PythonRDD.scala:153) with 1 output partitions
 20/06/19 13:30:36 INFO DAGScheduler: Final stage: ResultStage 1 (runJob at PythonRDD.scala:153)
 20/06/19 13:30:36 INFO DAGScheduler: Parents of final stage: List()
 20/06/19 13:30:36 INFO DAGScheduler: Missing parents: List()
 20/06/19 13:30:36 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[12] at RDD at PythonRDD.scala:53), which has no missing parents
 20/06/19 13:30:36 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 42.5 KB, free 365.7 MB)
 20/06/19 13:30:36 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 17.1 KB, free 365.6 MB)
 20/06/19 13:30:36 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.198.131:41585 (size: 17.1 KB, free: 366.2 MB)
 20/06/19 13:30:36 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1163
 20/06/19 13:30:36 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (PythonRDD[12] at RDD at PythonRDD.scala:53) (first 15 tasks are for partitions Vector(0))
 20/06/19 13:30:36 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
 20/06/19 13:30:36 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 4, 192.168.198.131, executor 0, partition 0, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:36 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.198.131:33457 (size: 17.1 KB, free: 366.3 MB)
 20/06/19 13:30:37 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 4, 192.168.198.131, executor 0): java.io.IOException: Cannot run program "python": error=2, No such file or directory
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
 	at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:197)
 	at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:122)
 	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95)
 	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
 	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:109)
 	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 	at org.apache.spark.scheduler.Task.run(Task.scala:123)
 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 	at java.lang.Thread.run(Thread.java:748)
 Caused by: java.io.IOException: error=2, No such file or directory
 	at java.lang.UNIXProcess.forkAndExec(Native Method)
 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
 	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
 	... 16 more

 20/06/19 13:30:37 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 5, 192.168.198.131, executor 0, partition 0, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:37 INFO TaskSetManager: Lost task 0.1 in stage 1.0 (TID 5) on 192.168.198.131, executor 0: java.io.IOException (Cannot run program "python": error=2, No such file or directory) [duplicate 1]
 20/06/19 13:30:37 INFO TaskSetManager: Starting task 0.2 in stage 1.0 (TID 6, 192.168.198.131, executor 0, partition 0, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:37 INFO TaskSetManager: Lost task 0.2 in stage 1.0 (TID 6) on 192.168.198.131, executor 0: java.io.IOException (Cannot run program "python": error=2, No such file or directory) [duplicate 2]
 20/06/19 13:30:37 INFO TaskSetManager: Starting task 0.3 in stage 1.0 (TID 7, 192.168.198.131, executor 0, partition 0, PROCESS_LOCAL, 8251 bytes)
 20/06/19 13:30:37 INFO TaskSetManager: Lost task 0.3 in stage 1.0 (TID 7) on 192.168.198.131, executor 0: java.io.IOException (Cannot run program "python": error=2, No such file or directory) [duplicate 3]
 20/06/19 13:30:37 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job
 20/06/19 13:30:37 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
 20/06/19 13:30:37 INFO TaskSchedulerImpl: Cancelling stage 1
 20/06/19 13:30:37 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage cancelled
 20/06/19 13:30:37 INFO DAGScheduler: ResultStage 1 (runJob at PythonRDD.scala:153) failed in 1.137 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, 192.168.198.131, executor 0): java.io.IOException: Cannot run program "python": error=2, No such file or directory
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
 	at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:197)
 	at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:122)
 	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95)
 	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
 	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:109)
 	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 	at org.apache.spark.scheduler.Task.run(Task.scala:123)
 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 	at java.lang.Thread.run(Thread.java:748)
 Caused by: java.io.IOException: error=2, No such file or directory
 	at java.lang.UNIXProcess.forkAndExec(Native Method)
 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
 	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
 	... 16 more

 Driver stacktrace:
 20/06/19 13:30:37 INFO DAGScheduler: Job 1 failed: runJob at PythonRDD.scala:153, took 1.153956 s
 Traceback (most recent call last):
  File "/home/orwa/spark_files/keras_spark_mnist.py", line 115, in <module>
    keras_model = keras_estimator.fit(train_df).setOutputCols(['label_prob'])
  File "/home/orwa/anaconda3/envs/spark1/lib/python3.7/site-packages/horovod/spark/common/estimator.py", line 37, in fit
    return super(HorovodEstimator, self).fit(df, params)
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 132, in fit
  File "/home/orwa/anaconda3/envs/spark1/lib/python3.7/site-packages/horovod/spark/common/estimator.py", line 78, in _fit
    verbose=self.getVerbose()) as dataset_idx:
  File "/home/orwa/anaconda3/envs/spark1/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/orwa/anaconda3/envs/spark1/lib/python3.7/site-packages/horovod/spark/common/util.py", line 636, in prepare_data
    num_partitions, num_processes, verbose)
  File "/home/orwa/anaconda3/envs/spark1/lib/python3.7/site-packages/horovod/spark/common/util.py", line 544, in _get_or_create_dataset
    df = df.rdd.map(to_petastorm).toDF()
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 58, in toDF
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 746, in createDataFrame
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 390, in _createFromRDD
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 361, in _inferSchema
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1378, in first
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1360, in take
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/context.py", line 1069, in runJob
  File "/home/orwa/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/home/orwa/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/home/orwa/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
 py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, 192.168.198.131, executor 0): java.io.IOException: Cannot run program "python": error=2, No such file or directory
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
 	at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:197)
 	at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:122)
 	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95)
 	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
 	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:109)
 	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 	at org.apache.spark.scheduler.Task.run(Task.scala:123)
 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 	at java.lang.Thread.run(Thread.java:748)
 Caused by: java.io.IOException: error=2, No such file or directory
 	at java.lang.UNIXProcess.forkAndExec(Native Method)
 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
 	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
 	... 16 more

 Driver stacktrace:
 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
 	at scala.Option.foreach(Option.scala:257)
 	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
 	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
 	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
 	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
 	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
 	at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:153)
 	at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 	at java.lang.reflect.Method.invoke(Method.java:498)
 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 	at py4j.Gateway.invoke(Gateway.java:282)
 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
 	at py4j.GatewayConnection.run(GatewayConnection.java:238)
 	at java.lang.Thread.run(Thread.java:748)
 Caused by: java.io.IOException: Cannot run program "python": error=2, No such file or directory
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
 	at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:197)
 	at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:122)
 	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:95)
 	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
 	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:109)
 	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 	at org.apache.spark.scheduler.Task.run(Task.scala:123)
 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 	... 1 more
 Caused by: java.io.IOException: error=2, No such file or directory
 	at java.lang.UNIXProcess.forkAndExec(Native Method)
 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
 	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
 	... 16 more

 20/06/19 13:30:38 INFO SparkContext: Invoking stop() from shutdown hook
 20/06/19 13:30:38 INFO SparkUI: Stopped Spark web UI at http://192.168.198.131:4041
 20/06/19 13:30:38 INFO StandaloneSchedulerBackend: Shutting down all executors
 20/06/19 13:30:38 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
 20/06/19 13:30:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
 20/06/19 13:30:38 INFO MemoryStore: MemoryStore cleared
 20/06/19 13:30:38 INFO BlockManager: BlockManager stopped
 20/06/19 13:30:38 INFO BlockManagerMaster: BlockManagerMaster stopped
 20/06/19 13:30:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
 20/06/19 13:30:38 INFO SparkContext: Successfully stopped SparkContext
 20/06/19 13:30:38 INFO ShutdownHookManager: Shutdown hook called
 20/06/19 13:30:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-8fa3f21f-7907-44dd-a936-0b9ba9f77e82
 20/06/19 13:30:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-0115e419-bf82-48ac-9597-a16ef73870fb
 20/06/19 13:30:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-8fa3f21f-7907-44dd-a936-0b9ba9f77e82/pyspark-cc0f0128-9595-46e1-b8e6-ad1fa34791d5