# Data mining

#### 1. Create a dataset

a. Click TensorBay to your Private workspace or Team workspace，and click *Create a New Dataset*

![](/files/vvtaibSdBAWOSaQQqBrl)

a. Enter the Developer Tools page，click *Create AccessKey* and copy it.

![](/files/NKm9zoZ1HWEKYydtsvfa)

b. Enter the dataset page you have created, Click *Action Configuration* and *Create Secret* on the Settings page.

![](/files/eWxLDMEQykP2wOEplmrw)

c. Name the secret you have created as `accesskey`, and paste the secret value that was copied in step a.

![](/files/QzTV42AMOH4aQ4wQ9k5Z)

#### 3. Create a workflow

a. Click *Create Workflow* on the Action page.

![](/files/h3JavKlcCnro70zHwn2Z)

b. Fill in the *workflow name*.（Notice： Workflow names can only contain lowercase letters, numbers and minus signs, and must not be less than 2 characters with a minus sign at the beginning.）

![](/files/9goOKerRiJXPr4w3Vfrc)

c. Choose the workflow *trigger* mode.（Default: on manual）

![](/files/Gu1vO5HAt4FozOdfX7z2)

d. Configurate the workflow *parameter*.（Notice：This example parameter is derived from the command line parameter of the Images setting to adjust the month of crawled paper. The default is 1.）

![](/files/vOTm5m1xhsUvB0y8GjRD)

e. Configurate the workflow *Instance*.

![](/files/tQIEV97Fd7Sp7iYlV8KQ)

f. Use the following code to create *YAML* fil&#x65;**.**

```
# A Workflow consists of multiple tasks that can be run serially or in parallel.
tasks:
  # This workflow includes four tasks: the scraper, pdf2txt, parser, statistics
  scraper:
    container:
         # The docker image on which this task depends is as below (Images from public and private repositories are both available)
      image: hub.graviti.cn/miner/scraper:2.3

      # The commaand`./archive/run.py {{workflow.parameters.month}}`will be excuted after Images running.
      command: [python3]
      args: ["./archive/run.py", "{{workflow.parameters.month}}"]
  pdf2txt:
    # pdf2txt depends on scraper, i.e. it only starts running after scraper has finished running
    dependencies:
      - scraper
    container:
      image: hub.graviti.cn/miner/pdf2txt:2.0
      command: [python3]
      args: ["pdf2txt.py"]
  parser:
    # parser depends on pdf2txt, i.e. it will only start running after pdf2txt has finished running
    dependencies:
      - pdf2txt
    container:
      image: hub.graviti.cn/miner/parser:2.0
      command: [python3]
      args: ["parser.py"]
  statistics:
    # statistics depend on the parser, i.e. they only start running after the parser has finished running
    dependencies:
      - parser
    container:
      image: hub.graviti.cn/miner/statistics:2.0
      command: [python3]
      args: ["statistics.py"]
```

![](/files/RDCC5vh5UyE9FAR1HJBW)

g. Publish the workflow.

![](/files/XKJXXNQi4xgE3ttaMw5t)

#### 4. Run the workflow

a. On the Action page, click the workflow you have created and run it.

![](/files/MqYS4KzYmT2e8SrSpEfU)

b. Adjust the parameter, for example, change the value to 10 (month), and click *Run*.

![](/files/eZNdTjZds4WQAyeMHjQk)

#### 5. View the results

a. Click the outcome to enter workflow detail page, and click *User Logs* to view details of the workflow log.

![](/files/lVwqI0F2AbqyFrAnHRE5)

b. On dataset detail page, click *General* -> *Dataset Preview* to view the statistics, which is the outcome of this workflow.

![](/files/TZ5A2Kb99kwvYBNt3ERY)

![](/files/nLx8k3RVZsBbbneuVZik)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graviti.com/dev-doc/tools/api-center/examples/data-mining.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
