博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[Spark][Python]DataFrame select 操作例子
阅读量:4699 次
发布时间:2019-06-09

本文共 4015 字,大约阅读时间需要 13 分钟。

 的 继续

In [4]: peopleDF.select("age")
Out[4]: DataFrame[age: bigint]

In [5]: myDF=people.select("age")

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-b5b723b62a49> in <module>()
----> 1 myDF=people.select("age")

NameError: name 'people' is not defined

In [6]: myDF=peopleDF.select("age")

In [7]: myDF.take(3)

17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 230.1 KB, free 871.7 KB)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 21.4 KB, free 893.1 KB)
17/10/05 05:13:02 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:55073 (size: 21.4 KB, free: 208.7 MB)
17/10/05 05:13:02 INFO spark.SparkContext: Created broadcast 5 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 251.1 KB, free 1144.2 KB)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 21.6 KB, free 1165.8 KB)
17/10/05 05:13:02 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on localhost:55073 (size: 21.6 KB, free: 208.7 MB)
17/10/05 05:13:02 INFO spark.SparkContext: Created broadcast 6 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO mapred.FileInputFormat: Total input paths to process : 1
17/10/05 05:13:03 INFO spark.SparkContext: Starting job: take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Got job 2 (take at <ipython-input-7-745486715568>:1) with 1 output partitions
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (take at <ipython-input-7-745486715568>:1)
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Missing parents: List()
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[14] at take at <ipython-input-7-745486715568>:1), which has no missing parents
17/10/05 05:13:03 INFO storage.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 4.3 KB, free 1170.2 KB)
17/10/05 05:13:03 INFO storage.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 2.5 KB, free 1172.6 KB)
17/10/05 05:13:03 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on localhost:55073 (size: 2.5 KB, free: 208.7 MB)
17/10/05 05:13:03 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1006
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[14] at take at <ipython-input-7-745486715568>:1)
17/10/05 05:13:03 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
17/10/05 05:13:03 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2149 bytes)
17/10/05 05:13:03 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
17/10/05 05:13:03 INFO rdd.HadoopRDD: Input split: hdfs://localhost:8020/user/training/people.json:0+179
17/10/05 05:13:03 INFO codegen.GenerateUnsafeProjection: Code generated in 113.719806 ms
17/10/05 05:13:03 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 2235 bytes result sent to driver
17/10/05 05:13:03 INFO scheduler.DAGScheduler: ResultStage 2 (take at <ipython-input-7-745486715568>:1) finished in 0.493 s
17/10/05 05:13:03 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 487 ms on localhost (1/1)
17/10/05 05:13:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Job 2 finished: take at <ipython-input-7-745486715568>:1, took 0.737231 s
Out[7]: [Row(age=None), Row(age=30), Row(age=19)]

In [8]:

转载于:https://www.cnblogs.com/gaojian/p/7629891.html

你可能感兴趣的文章
Bootstrap概述
查看>>
elementUi源码解析(1)--项目结构篇
查看>>
C#中用DateTime的ParseExact方法解析日期时间(excel中使用系统默认的日期格式)
查看>>
任务二 阅读报告
查看>>
高阶函数
查看>>
W3100SM-S 短信猫代码发送 上
查看>>
【Netty】EventLoop和线程模型
查看>>
打开一个页面,并监听该页面的关闭事件
查看>>
软件保护技术--- 常见保护技巧
查看>>
SQL批量分离工具
查看>>
Erlang与ActionScript3采用JSON格式进行Socket通讯
查看>>
python数据类型--数字、字符串
查看>>
CentOS7使用firewalld打开关闭防火墙与端口
查看>>
Servlet实现图片读取显示
查看>>
正则表达式去除括号的问题
查看>>
基于深度学习的目标检测研究进展
查看>>
python运算学习之Numpy ------ 数组操作:连接数组、拆分数组 、广播机制、结构化数组、文件贮存与读写、np.where、数组去重...
查看>>
无熟人难办事?—迪米特法则
查看>>
POJ 1469 COURSES 二分图最大匹配
查看>>
13.python中的字典
查看>>