Data mining
Based on TensorBay Action, this example integrates four steps: data crawling, conversion, parsing, and analytics into a complete workflow, giving you a quick overview of the Graviti Data platform.
Last updated
Was this helpful?
Based on TensorBay Action, this example integrates four steps: data crawling, conversion, parsing, and analytics into a complete workflow, giving you a quick overview of the Graviti Data platform.
Last updated
Was this helpful?
Was this helpful?
# A Workflow consists of multiple tasks that can be run serially or in parallel.
tasks:
# This workflow includes four tasks: the scraper, pdf2txt, parser, statistics
scraper:
container:
# The docker image on which this task depends is as below (Images from public and private repositories are both available)
image: hub.graviti.cn/miner/scraper:2.3
# The commaand`./archive/run.py {{workflow.parameters.month}}`will be excuted after Images running.
command: [python3]
args: ["./archive/run.py", "{{workflow.parameters.month}}"]
pdf2txt:
# pdf2txt depends on scraper, i.e. it only starts running after scraper has finished running
dependencies:
- scraper
container:
image: hub.graviti.cn/miner/pdf2txt:2.0
command: [python3]
args: ["pdf2txt.py"]
parser:
# parser depends on pdf2txt, i.e. it will only start running after pdf2txt has finished running
dependencies:
- pdf2txt
container:
image: hub.graviti.cn/miner/parser:2.0
command: [python3]
args: ["parser.py"]
statistics:
# statistics depend on the parser, i.e. they only start running after the parser has finished running
dependencies:
- parser
container:
image: hub.graviti.cn/miner/statistics:2.0
command: [python3]
args: ["statistics.py"]