Creating a Featureset

Search Knowledge Base by Keyword

Creating a Featureset

A featureset is a subset of the dataset with text and their classes which is ultimately used as input for training models.  Using the entire dataset for models can lead to inaccuracy and a higher chance of using unclean data, so we created the featureset to ensure accuracy in more models.

How to create a featureset in text extraction (named-entity recognition):

  1. Select Machine Learning from the left menu and select “Featureset”. From there, you will be directed to the ‘Featureset’ page

  2. On the top right corner, select “Add Featureset”

  3. Name your new featureset and add its description

  4. From the drop-down menu, select which dataset the featureset will be selected from

  5. On the ‘Feature Selection’ tab, select the categories from the ‘collect’ column. Then, label the column or merge* the collect and label columns

    * Not empty records from label column replace records in collect column.

  6. Define and train the test sets. Skyl provides two options for defining and training test sets. A user can either split the dataset, or explicitly extract from the dataset

    Define the train and test sets with the ‘Split the dataset’ policy:

    1. Choose the selection method from which rows are selected

    2. Enter the number* of records for selection.

      *the number of selected records cannot be bigger than the number of records in dataset

    3. Enter the train ratio (from 0.01 to 1). The train ratio is the proportion of selection rows versus columns that go into the training set. The remaining are transferred to the test set

    4. Enter a random seed so that you can reproduce results

    Define train and test sets with the ‘Explicit Extract from the Dataset’ policy:

    1. Choose the selection method and number of records under ‘Train Set’

    2. Enter the number of records and choose selection method test set

    3. Optional- Apply a filter to train and test sets so that you can avoid any unwanted records in the selection process.

    4. Enter a random seed so that you can reproduce results

  7. Select ‘Create Featureset’ to create a new featureset

All created featuresets are displayed on the ‘Featureset’ page.

Each featureset has a status of creation:

  • ‘Completed’ means the featureset is ready to be used for ML training

  • ‘In Progress’ means tells that featureset is being prepared

  • ‘Failed’ means the featureset wasn’t created

Open the Featureset details slider by selecting the proper featureset to see the details and status of each Featureset

What’s Next?