This transformation extracts the noun and noun phrases from
text. Let's understand Term extraction transformation SSIS by simple example:
Step 1:
Create a new table and populate with some data:
CREATE TABLE
tblDocument (
ntDocumentID BIGINT
,vcDoucumentName
VARCHAR(100)
,vcData VARCHAR(MAX)
)
INSERT INTO
tblDocument
VALUES
(1,'Math','I am learning mathematics.')
,(2,'Che','I am learning chemistry.')
,(3,'Phy','I am learning physics.')
Step 2:
Drag any data source adapter in data flow task area. In this example in is
ADO .NET source adapter. It is connected with Sql server:
Step
3: Drag term extract transformation and
connect with data source adapter. After double clicking it will pop up term
exaction transformation editor. Select the any on text data type column from
where we want to extract the nouns or noun phrases.
Step 4: If
we to exclude a set of nouns or noun phrases. In this example we want to
exclude the noun "Chemistry".
We are creating a new table in database and populating exclude terms:
CREATE TABLE
tblExclude (
ntExcludeID BIGINT
,vcNoiseWord
VARCHAR(100)
)
INSERT INTO
tblExclude
VALUES(1,'chemistry')
Again come
back to SSIS. Under exclusion tab, create a new database connect where we have
just created a table named "tblExclude". Select the table and column
name from drop down.
Step 5:
Under
advance tab:
Term type: Noun
means a single word noun and Noun phrase means groups of words (Noun, adjective
etc.) In this example we are choose "Noun"
Score Type:
Frequency means total occurrence of a word. TFIDF is a fractional type
score. In this example we are choosing TFIDF.
Parameters:
Frequency threshold means minimum occurrence of word to exact from text
column. In this example we are choosing only occurrence is sufficient.
Step
6: Populate the output to any destination and run it. Sample output: