Three Min Series - Run the Example of WordCount in PyFlink 1.9

Motivation/动机

PyFlink is a new module recently added to Apache Flink . Many people want to quickly experience how to run PyFlink jobs. This blog will run the WordCount example in PyFlink 1.9 in 3 minutes.

PyFlink是最近添加到Apache Flink的新模块。 许多人想快速体验如何运行PyFlink作业。 本文将在3分钟内完成在PyFlink 1.9中运行WordCount示例。

Steps/操作步骤

  • Check Python Version/检查Python版本
    1
    2
    jincheng:videos jincheng.sunjc$ python --version
    Python 3.7.6

It’s better to use Python 2.7.6+
最好使用Python2.7.6+

  • Download binary distribution/下载二进制发布包
1
curl -O  http://mirrors.gigenet.com/apache/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.11.tgz
  • Start cluster/启动集群
1
2
3
tar -zxvf flink-1.9.1-bin-scala_2.11.tgz
cd flink-1.9.1
bin/start-cluster.sh

If all goes well, open the console: http://localhost:8081/

  • Submit WordCount job/提交WordCount作业
1
2
3
4
5
6
jincheng:flink-1.9.1 jincheng.sunjc$ ./bin/flink run -py examples/python/table/batch/word_count.py
Starting execution of program
Results directory: /var/folders/fp/s5wvp3md31j6v5gjkvqbkhrm0000gp/T/result
Program execution finished
Job with JobID 12ffba88d96dc7fd78b514aa40bb4b6d has finished.
Job Runtime: 981 ms
  • Check the result
1
2
3
4
5
6
7
8
9
jincheng:flink-1.9.1 jincheng.sunjc$ cat /var/folders/fp/s5wvp3md31j6v5gjkvqbkhrm0000gp/T/result
ASF,2
Apache,2
Foundation,1
...
...
with,2
work,1
you,2
  • Stop cluster/停止集群
1
2
3
jincheng:flink-1.9.1 jincheng.sunjc$ bin/stop-cluster.sh 
Stopping taskexecutor daemon (pid: 2718) on host jincheng.local.
Stopping standalonesession daemon (pid: 2302) on host jincheng.local.

Video presentation/视频演示

Q&A

If you have questions, you can leave a comment or send an email to: sunjincheng121@gmail.com

如有疑问,您可以发表评论或发送电子邮件到:sunjincheng121@gmail.com