What is the “task” in Storm parallelism

question

I’m trying to learn twitter storm by following the great article “Understanding the parallelism of a Storm topology

However I’m a bit confused by the concept of “task”. Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ?

Moreover in a general parallelism sense, Storm will spawn a dedicated(专用) thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ? I think having multiple tasks in a thread, since a thread executes sequentially, only make the thread a kind of “cached” resource, which avoids spawning new thread for next task run. Am I correct?
—-

answer

However I’m a bit confused by the concept of “task”. Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ?

Yes, and yes.

Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ?

Running more than one task per executor does not increase the level of parallelism — an executor always has one thread that it uses for all of its tasks, which means that tasks run serially on an executor.

每个executor运行超过一个数量的task并不会提升并行度,一个executor总是会只有一个线程给所有的task来用。这意味着这些task是顺序运行的

As I wrote in the article please note that:

  • The number of executor threads can be changed after the topology has been started (see storm rebalance command).(executor数量可以动态修改)
  • The number of tasks of a topology is static.(拓扑启动以后,task数量不可动态修改。这里一个简单的理解,在stream grouping中的fields grouping策略,其实根据mod hash进行路由的。如果task数量发生变化,则可能导致新的路由地址跟之前不一致)

And by definition there is the invariant of #executors <= #tasks.