The Unscrambler Tutorials
The Unscrambler Tutorials
Tutorials
   By CAMO Process AS
This manual was produced using ComponentOne Doc-To-Help  2005 together with Microsoft
Word. Visio and Excel were used to make some of the illustrations. The screen captures were taken
with Paint Shop Pro.
Trademark Acknowledgments
Doc-To-Help  is a trademark of ComponentOne LLC.
Microsoft is a registered trademark and Windows 95, Windows 98, Windows NT, Windows 
2000, Windows ME, Windows XP, Excel and Word are trademarks of Microsoft Corporation.
PaintShop Pro is a trademark of JASC, Inc.
Visio is a trademark of Shapeware Corporation.
Restrictions
Information in this manual is subject to change without notice. No part of the documents that build it
up may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any
purpose, without the express written permission of CAMO Process AS.
Software Version
This manual is up to date for version 9.5 of The Unscrambler.
Document last updated on March 17, 2006.
        Note:
        Some model names in these tutorials assume that your files are stored on a computer which runs Microsoft
        Windows Operating Systems (Windows 9X, Windows 2K, Windows NT and Windows XP). Substitute the
        long filenames with names that comply with the DOS 8.3 rule if The Unscrambler is installed on a file server
        running Windows for Workgroups, or similar.
        We suggest that you copy the data files to a safe place. This way you can always start from scratch with the
        tutorials.
        Read the details below to understand which tutorials are useful in your case, and get some practical advice for
        running the tutorials.
        Depending on your degree of experience in using The Unscrambler and your fields of interest, here are the
        tutorials we recommend that you start with:
                                                      Wavelength
                                  Blue    Red
        Seven solutions, or samples, have known concentrations, Y, of a, and can be used as the calibration samples.
        Three other samples have unknown concentrations, which should be predicted by the use of a regression
        model.
         Task
         Start The Unscrambler and log in.
         How to Do It
         Start The Unscrambler by double-clicking on The Unscrambler icon or selecting The Unscrambler from the
         Start menu in Windows. A list of the users that are registered in The Unscrambler is shown. (Lookup Image
         A002)
         Select yourself from the list of users and click OK. If your name does not appear, the system supervisor has to
         add your name to the list of users. You are asked to enter your password before The Unscrambler is opened if
         the system supervisor has set this option.
         Task
         Read the Tutor_a data file into the Editor and view some basic statistics of the data table.
         How to Do It
         Use File - Open to select the file Tutor_a in the Examples directory. This directory should be below the
         directory where you installed The Unscrambler. (Lookup Image A003)
        Some basic statistics like the Mean, Standard Deviation and Skewness of the samples and variables can be
        calculated and shown in a new Editor. Select View - Sample Statistics or View - Variable Statistics. A
        dialog pops up which asks you on which part of the data table to calculate the statistics: (Lookup Image
        A005)
        Accept the default choice (All samples or All variables) and click OK. A new Editor is launched with means,
        standard deviations, etc. (Lookup Image A006)
        Close the Editor window with the statistics before you continue.
         How to Do It
         Choose Modify - Edit Set to launch the Set Editor. You see the list of already defined Variable Sets (which
         in this case is empty). (Lookup Image A007)
A007 Set Editor Dialog (Variable Sets) A008 New Variable Set Dialog
         Press Add... to launch the New Variable Set dialog (Lookup Image A008), where you define the first
         variable Set:
         
          Name: Light Absorbance
           Data Type: Non-Spectra
           Interval: 1-2
         You can enter the variable numbers directly in the Set Interval field, or click Select to launch an interactive
         Editor where you mark the variables that belong to the Set. De-select variables you have marked by mistake by
         pressing <Ctrl> while you click on the variable you want to remove from the Set.
         Click OK. Back in the Set Editor, press Add... again to launch the New Variable Set dialog once more,
         where you define the second variable Set:
         
          Name: Constituent A
           Data Type: Non-Spectra
           Set Interval: 3
         Click OK.
         Change the Set type to Sample Sets by selecting Sample Sets from the drop-down list in the Set Editor.
         (Lookup Image A009)
        Press Add... to launch the New Sample Set dialog (Lookup Image A010), where you define the following
        Sample Sets in the same way as you defined the Variable Sets:
        
        Name: Calibration Samples
          Interval: 1-7
        
        Name: Prediction Samples
          Interval: 8-10
        Click OK when you are finished with the Set Editor.
        You will save a lot of energy in your own analyses later by defining the necessary Sets from the beginning. All
        analyses and plotting will be much easier for you to set up.
Remember to save the data table before you proceed by selecting File - Save or pressing the button.
        Task
        Find the regression of component a on the absorbance of red light ( X1 ).
        How to Do It
        You do the regression by plotting the red light variable ( X 1 ) against component a:
         We want to do the univariate regression on the calibration samples only, and the Y-values are missing in the
         prediction Set.
         The plot displayed here appears (Lookup Image A012), but without the trend lines. Toggle the regression
         and/or target line on and off using View - Trend Lines - Regression Line/Target Line. The target line is
         very useful in predicted vs. measured plots.
         Statistics from the plot is shown in a special frame in the upper left corner. Toggle it on and off using View -
         Plot Statistics.
         Tutorial A - Calibration
         Now it is time to make the first multivariate model.
         Task
         Make a PLS regression model between the absorbance measurements and the concentration of a.
         How to Do It
         Activate the Tutor_a data table by clicking on it or selecting it from menu Window  1 Tutor_a. Unmark
         the variables by pressing the <Esc> key.
         Select Task - Regression. Use the following parameters to define the model in the Regression dialog:
         (Lookup Image A013)
          Method: PLS1
          Samples: Calibration Samples [7]
          X-variables: Light Absorbance [2]
          Y-variables: Constituent A [1]
          Weights: All 1.0
          Validation method: Leverage Correction
          Num PCs (number of components): 2
          Model Size: Full
        The Center Data and Issue Warnings tick-boxes should always be checked, and Add Start Noise un-
        checked. This also applies to all models you make later.
        Leverage correction is a validation method that is quick, but may give too optimistic results. It is useful in the
        first runs of modeling, and when the data table is small. Therefore we use it here and in most of the later
        tutorials. You should use a more conservative validation method for your own data. When Leverage correction
        is used in a PLS model, an information dialog pops up (Lookup Image A014).
        Click OK to start the calibration.
        Task
         Interpret the residual variance curve
         Display the modeling results
         Study the Regression Coefficients plot
        You can also always display the modeling results of saved models from the Results menu. Select the kind of
        model you want to look at and mark the model in the Results dialog.
        Information about the model is available in the Information field. This is useful to answer questions like:
        Which sample Set did we use to make the model? Did we remember to weight the X-variables? In the Results
        dialog, the residual variance curve is displayed on a small screen to see the performance of the model.
        Go to Edit - Options and select Bars in the Plot Layout field if the plot is displayed as curves. (Lookup
        Image A018)
        Let the mouse cursor rest over one of the bars to see which variable it is. Click once more to get th e object
        information window. The b-coefficient for the Red absorbance is 1.04, the b-coefficient for the Blue
        absorbance is -0.208 and the offset (B0) is 1E-06, i.e. approximately zero.
        The b-coefficients enable us to write the model equation relating the concentration of a to the Red and Blue
        light absorbances:
        Tutorial A - Prediction
        The purpose of making a regression model is most of the time to be able to predict the response value of new
        samples that are measured in the future.
        Task
        Use the calibration model to predict the concentration of a in the three unknown samples in the data table.
        How to Do It
        Use menu Window  1 Tutor_a to activate the data table, but do not close the Viewer with the regression
        coefficients. Then, select Task - Predict. Use the parameters below to make the necessary specifications in
        the Prediction dialog: (Lookup Image A019)
          Samples: Prediction Samples    3
          X-variables: Light Absorbance   2
          Y-reference: no selection (do not include Y-reference values)
          Model Name: Tutorial A
          Number of Components: 2
        Press Find to select the model if you do not remember its name or are not sure where it is. You may also enter
        the model name directly into the field. Click OK to start the prediction.
        Task
        Evaluate the prediction results by looking at the plots Predicted vs. Measured from the PLS calibration stage
        and Predicted with Deviation from the prediction stage.
        How to Do It
        First, let us look at the predictions you just made. Press View in the Prediction Progress dialog to open the
        Viewer with the prediction results: (Lookup Image A020)
        Save the results file under the name Tutorial A Prediction 1 before you proceed.
        Activate the Viewer with the regression coefficients (Window  2 Tutorial A). This is one way to go back
        to the model results. Then select Plot - Predicted vs Measured and specify the following parameters in the
        Predicted vs Measured dialog: (Lookup Image A021)
          Plot type: Predicted vs. Measured
          Y-variable: 1; Comp a
          Components: 2
          Samples: Calibration
        Click OK.
        The Predicted vs Measured plot appears. (Lookup Image A022)
        Use View - Trend Lines to toggle the regression and/or target line on and off.
        Use View - Plot Statistics to toggle the statistics windows on and off.
        You see that the prediction by the PLS model is extremely good in this case. Compare this multivariate
        regression with the univariate regression result, which used variable Red only to predict Comp a (Lookup
        Image A012). The correlation between predicted and measured is higher when the multivariate model, based
        on both Red and Blue, is used for prediction.
        Task
        Make a new calibration with the same parameters as last time, but change the validation method to cross
        validation.
        How to Do It
        Activate the Editor (Window  1 Tutor_a) and select Task - Regression. Use the following parameters:
        (Lookup Image A023)
         Method: PLS1
         Samples: Calibration Samples [7]
         X-variables: Light Absorbance [2]
         Y-variables: Constituent A [1]
         Weights: All 1.0
         Validation method: Cross Validation
         Num PCs (number of components): 2
        Task
        Use some of the plot options.
        How to Do It
        Press View in the PLS1 Regression Progress dialog to launch the regression overview of the latest model.
        (Lookup Image A025)
        Activate the Scores plot, which is the upper left plot, by clicking in it.
        Select Edit - Options and go to the Sample Grouping tab. Tick the box Enable Sample Grouping and
        choose Cross Validation Segments in the Group By field. (Lookup Image A026)
        Click OK. (Lookup Image A027)
        You see that each segment from the cross validation has its own color in the score plot. In this example each
        segment had only one sample, but in other cases this is a good way to see how the segments are distributed in
        the whole population of samples.
        Activate the Predicted vs. Measured plot (lower right corner) and select Edit - Options. Enable sample
        grouping the same way as you did before, but group by Value of Variable this time, choose X-variable 1 and
        select to generate three groups. (Lookup Image A028)
        Use the Next Horizontal PC     and Previous Horizontal PC      buttons to display the active plot
        Predicted vs Measured for one/two Principal Components. (Lookup Image A030)
        Save this last model and give it a meaningful name, if you want to take a look at it later from Results -
        Regression without remaking the whole model.
        We are also interested in finding a way to rationalize quality control, since the use of taste panels is very
        costly. Therefore, we will try to find instrumental measurement variables to replace some of the sensory
        testing. Problem II is thus to explore the relationships between sensory variables and chemical/instrumental
        measurements.
        Finally we would like to predict consumer preference for raspberry jam from descriptive sensory analysis. This
        is Problem III.
          
          Insert category variables;
          
          Define Sets;
          
          Decompose by PCA;
          
          Interpret scores and loadings;
          
          PLS regression;
          
          Export models;
          
          Predict response values from new samples;
          
          Estimate regression coefficients;
          
          Find optimal number of components;
          
          Numerical results.
18 Quality Analysis with PCA and PLS (Tutorial B)                                         The Unscrambler Tutorials
        Agronomic production variables
        The samples are taken from four different cultivars, at three different harvesting times:
        No    Name         Cultivar    Harvest       No      Name          Cultivar     Harvest
                                       time                                             time
        1      C2-H1        2               1         7       C2-H3        2             3
        2      C4-H1        4               1         8       C4-H2        4             2
        3      C3-H3        3               3         9       C1-H2        1             2
        4      C3-H1        3               1         10      C3-H2        3             2
        5      C1-H1        1               1         11      C1-H3        1             3
        6      C4-H3        4               3         12      C2-H2        2             2
        Note that the agronomic production variables are not used as input variables in any of the matrices, but they are
        known information which is very valuable for the interpretation of the results of the data analysis. They will
        be utilized as category variables later.
        Note that the variable numbers in that table are within the Instrumental Variable set, and not to the variable
        numbers in the original data table.
        Note that the variable numbers in that table are within the Sensory Variable set, and not the variable numbers
        in the original data table.
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 19
        Variable Set Preference
        114 representative consumers tasted the 12 jam samples and gave them preference scores on a scale from 1-9.
        The average over all consumers for each sample is given in the data table.
        Task
        Insert two category variables Cultivar and Harvest Time.
        How to Do It
        Open the data file Tutor_b by selecting File - Open. (Lookup Image B001)
        Activate a cell in the first column of the table as we will insert our category variables at the beginning of the
        table. Then, follow these five steps:
20 Quality Analysis with PCA and PLS (Tutorial B)                                           The Unscrambler Tutorials
           1. Select Edit - Insert - Category Variable. The dialog: Category Variable Wizard - Enter
              Variable Name and Choosing Method pops up. (Lookup Image B002)
           2. Enter Category Variable Name Cultivar and choose I want to specify the levels manually. Press
              Next.
           3. This launches the Specify Levels dialog, (Lookup Image B003) where you must specify the levels
              of your new category variable. Use C1, C2, C3, and C4 as the level values for Cultivar. Type in the
              name of the level in the Level name field and press Add to add the level; one for each cultivar.
           4. Press Finish and the category variable is inserted into the Editor. (Lookup Image B004)
              Note that category variable names appear in blue fonts in the Editor to distinguish them from ordinary
              variables.
           5. All cells are filled with m to denote missing. Enter the level for a sample by double -clicking the
              category variable cell. The cell is highlit and a drop-down list appears. Click to see the available
              levels and click on the correct one. Use the arrow keys to move up and down in the list. The cultivar
              (and harvest time) values are seen in the sample name.
        B002 The Category Variable Wizard -                 B003 The Category Variable Wizard -
        Enter Variable Name and Choosing                    Specify Levels dialog
        Method dialog
        B004 The Tutor_b data table displayed in            B005 The Tutor_b data table displayed in
        the Editor (with Cultivar)                          the Editor (after insertion of Cultivar and
                                                            Harvest Time)
        Insert the category variable Harvest Time and fill in the correct Harvest Time levels by repeating the five -
        step procedure above (Lookup Image B005).
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 21
        Tutorial B - Check Variable Sets
        In The Unscrambler, matrices are defined by Sample and Variable Sets. It is a good habit to define all Sets
        before any analyses are performed.
        Task
        Check that the three Variable Sets: Instrumental, Sensory and User Preference were defined.
        How to Do It
        Select Modify - Edit Set to open the Set Editor. Check that the three following Variable Sets were defined:
        (Lookup Image B006)
          Set name: Instrumental
          
            Data Type: Non-Spectra
            Size: 6 variables
            Interval: 3-8
          
          Set name: Preference
            Data Type: Non-Spectra
            Size: 1 variable
            Interval: 14
          
          Set name: Sensory
            Data Type: Non-Spectra
            Size: 12 variables
            Interval: 9-13, 15-21
B006 The Set Editor dialog with three User-defined Variable Sets
        These sets were defined for you, and automatically saved as data table Tutor_b was saved. When working on
        your own data, to create sets press the Add button in the Set Editor dialog. This launches the New Variable
22 Quality Analysis with PCA and PLS (Tutorial B)                                        The Unscrambler Tutorials
        Set dialog. Enter the Name and Data type of your set. Press Select to launch an Editor where you can mark
        the variables that belong to the Set you are defining. You may alternatively enter the set intervals directly in
        the Set Interval field.
        Note that the Set Sensory is not continuous, but consists of two ranges in the data table. Together, the
        variables from these two ranges define variable set Sensory.
        Task
        Define two Sample Sets: Calibration Sam and Prediction Sam.
        How to Do It
        In the Set Editor dialog, change the set type to Sample Sets and define the following parameters:
         
         Name: Calibration Sam
           Set Interval: 1-12
         
         Set Name: Prediction Sam
           Set Interval: 13-20
        To do this: Press the Add button in the Set Editor dialog. This launches the New Sample Set dialog. Enter
        the Name of the set. Press Select to launch an Editor where you can mark the samples that belong to the Set
        you are defining. You may alternatively enter the set intervals directly in the Set Interval field.
Save the data file in the Editor before you continue with the tutorial.
        Task
        Make a PCA model using the Set Sensory (i.e. one data matrix is decomposed by PCA).
The Unscrambler Tutorials                                        Quality Analysis with PCA and PLS (Tutorial B) 23
        How to Do It
        Select Task - PCA. Specify the following parameters in the Principal Component Analysis dialog:
        (Lookup Image B007)
          
          Samples: Calibration Sam [12]
          
          Variables: Sensory [12]
          
          Weights: All 1.0
          
          Validation method: Cross Validation
          
          Num PCs: 8
        B007 The Principal Component Analysis               B008 The Cross Validation Setup dialog
        dialog
        Press Setup in the Validation Method field to specify in the Cross Validation Setup dialog that Full
        Cross Validation is to be used (Lookup Image B008). This validation method is more time consuming than
        leverage correction, but the estimate of the residual variance is more reliable.
        No weighting is used in this model, i.e. all weights are set to 1.0, to see which variables do actually vary the
        most. However, sensory variables are often weighted when you investigate relationships with other variables.
        The most common weighting to use is 1/SDev.
        Click OK to start the PCA. You see how The Unscrambler makes a PCA model for each segment, twelve in
        all. Finally, the global model is made and the residual variance curve is shown for this model.
        Task
        Interpret the residual variance curve in the PCA Progress dialog. This displays the progress of the modeling.
        The residual variance should decrease as the number of PCs in the model increase, and should be as small as
        possible.
24 Quality Analysis with PCA and PLS (Tutorial B)                                          The Unscrambler Tutorials
        How to Do It
        The residual variance decreases until PC 5 is reached. Then the residual va riance increases again due to
        overfitting. The important decision we face now is to select the optimal number of PCs in this model.
        The lowest residual variance is found with 5 PCs, but the residual variance in a model using 3 PCs is not much
        worse. A simple model is more robust than a complex one, and easier to interpret. We therefore choose to work
        with a model consisting of 3 PCs.
        Note that the residual variance shown here is the residual variance for X, while the regression models made in
        Tutorial A and later in this tutorial show the residual variance for Y. This reflects the difference between PCA
        and regression models. PCA focuses on one matrix, X, containing variables which describe the samples;
        regression models like PLS focus on a second matrix, Y, which contains variables to be predicted.
        Press View to take a closer look at the other model results. The residual variance turns up again in the PCA
        Overview, (Lookup Image B009) which consists of four plots that reveal a lot of information.
        The Viewer which you are now looking at has the most common model results available for you as predefined
        plots in the Plot menu. You can always get this display of your model back via the Results menu. Let us look
        at the different plots in the PCA overview.
The Unscrambler Tutorials                                      Quality Analysis with PCA and PLS (Tutorial B) 25
        Task
        Change the residual variance plot to an explained variance plot.
        How to Do It
        Activate the lower right plot by clicking in it. Select View - Source and change this option from Residual
        Variance to Explained Variance. You also have access to this menu option by right-clicking with the
        mouse in the plot or by using the corresponding toolbar button for explained variance . A more elaborate
        way of doing this is to make the plot once again using Plot - Variances and RMSEP, but the other ways to
        change the plot are preferred because faster.
        The residual variance is now converted to explained variance (Lookup Image B010). The information is the
        same, but presented in another way. The residual variance is well suited to find the optimal number of PCs to
        use in a model, while the explained variance is a better measure to tell how much of the variation in the data
        the model describes.
        You see that a model with 3 PCs describes almost 92% of the validated variation in the data; for calibration it
        is 97%. You can get the value by clicking at the data point in the plot. Use the toolbar buttons    and      to
        change between having only the calibrated or validated variance curve plotted, or both.
B010 The PCA Explained Variance plot B011 The PCA Scores plot
        Task
        Interpret plot Scores . Use different plot options to ease interpretation.
        How to Do It
        The score plot shows the projected locations of the objects onto the PCs, and by studying patterns you may
        find the meaning of the PCs. (Lookup Image B011) There are many patterns to be detected from score (and
        loading) plots.
        On the score plot, you will notice that the 12 samples are not arranged in a random way on the map. When you
        move from the left to the right part of the plot, you first encounter samples harvested on time H1, then H2 and
        finally H3. Moreover, if you now move from the top to the bottom, you see several C4 samples first, then C3,
        then C2, and finally C1.
26 Quality Analysis with PCA and PLS (Tutorial B)                                          The Unscrambler Tutorials
        The category variables that were inserted into the data table will make things even clearer. Select Edit -
        Options. Select the Sample Grouping tab and tick Enable Sample Grouping. Choose the following
        options: (Lookup Image B012)
         
         Separate with: Colors
         
         Group By: Value of Variable; Levelled Variable
         
         Markers Layout: Name
        You may press Select in the Group By field to select the levelled variable that you want to use as a marker. It
        launches an Editor where you can mark the category variable of your choice, for example variable 1: Cultivar.
        B012 The Sample Grouping sheet in the                B013 The Options dialog, Markers Layout
        Options dialog                                        Before and After adjusting the Markers
                                                             Layout
        Now, we are going to alter the Markers Layout a little. The sample names are entered with an underscore in
        the data table. We are going to remove this underscore from the markers in the plot.
        Click once in the fifth box in the Name sequence. All boxes that are ticked correspond to letters in the sample
        names that will be displayed. Press the <Ctrl> key and click the third box to remove the third character (i.e. the
        underscore). (Lookup Image B013)
        Note:
        The first click marks the beginning; the second click marks the end of a range. Make it a habit to click twice
        whenever you want to mark a range of marker characters; once to mark the beginning of the range and once
        again to mark the end of the range. Press the <Ctrl> key at the same time as you click a box to (de-)select a box
        in the marker.
        Press OK. The Scores plot is updated with the Sample grouping options. Each level of the category variable is
        assigned a unique color, and the markers in the plot are displayed without the underscore. (Lookup Image
        B014)
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 27
        B014 PCA, Scores plot with samples grouped by colors
Try to perform a new sample grouping, this time upon category variable Harvest Time.
        Task
        Interpret variable relationships in the correlation loadings plot.
        How to Do It
        Activate the X-Loadings plot by clicking on it, then use menu View - Correlation Loadings or the
        corresponding shortcut button. The Correlation Loadings plot is best appropriate to study variable
        correlations. (Lookup Image B015)
        B015 PCA, Correlation Loadings (X) plot (PC1 vs PC2)
        The plot shows that two variables (REDNESS and COLOUR) have an extreme position to the right of the plot
        along PC1. They are close to each other, and far from the center, very close to the 100% explained variance
28 Quality Analysis with PCA and PLS (Tutorial B)                                        The Unscrambler Tutorials
        circle; they correlate positively. This also means that objects lying to the right of the score plot have higher
        values for those two variables.
        Along the vertical axis (PC2), you notice two variables lying at the top (R.SMELL and R.FLAV), opposed to
        variable OFF FLAV which lies at the bottom. So we see that raspberry smell and flavor correlate positively
        with each other, and negatively with off-flavor. Thus, the more you move up on the score plot, the more the
        smell and flavor of the samples will be characteristic of raspberries.
        Task
        Relate Scores (samples) information to Loadings (variables) information.
        How to Do It
        The Scores plot and Correlation Loadings plot show that samples C2H3 and C1H3 have strong color and
        redness intensities, while sample C1H2 has much off-flavour. Samples in one spot of the 2-vector score plot
        has, in general, much of the properties of the variables pointing in the same direction in the loading plot,
        provided that the plotted PCs describe a large portion of the variance.
        PC 3 describes the variation in sweetness, bitterness and chewing resistance. Confirm this by activating the
        loading plot (upper right quadrant) and selecting Plot - Loadings. Display PC 1 vs. PC 3 by changing
        Vector 2 in the Components field in the Loadings dialog to 3. (Lookup Image B016)
        B016 The Loadings dialog
The Unscrambler Tutorials                                        Quality Analysis with PCA and PLS (Tutorial B) 29
        On this new plot, the horizontal axis is unchanged (PC1) and the vertical axis is PC3. Use View -
        Correlation Loadings to better interpret variable correlations along PC3.
        Task
        Interpret the influence plot, which is used to look for outliers.
        How to Do It
        The influence plot is displayed in the lower left quadrant. The strongest outliers are placed in the upper right
        corner of the plot, and have a large leverage and a high residual variance. In this particular case, we do not see
        any outliers. (Lookup Image B017)
        B017 PCA, Influence Plot
        Close the PCA overview and save the results file with the name Tutorial B PCA. Close all other Viewers you
        may have open at the same time.
30 Quality Analysis with PCA and PLS (Tutorial B)                                           The Unscrambler Tutorials
        Task
        Make a PLS2 regression model that predicts the variations in sensory variables from instrumental and chemical
        variables.
        How to Do It
        Select Task - Regression. Specify the following parameters in the Regression dialog: (Lookup Image
        B018)
         
         Method: PLS2
         
         Samples: Calibration Sam [12]
         
         X-variables: Instrumental [6]
         
         Y-variables: Sensory [12]
         
         Weights: All 1/SDev in X and Y
         
         Validation Method: Cross Validation
         
         Number of components: 6
        Press Weights to launch the Set Weights dialog. (Lookup Image B019) Press All to change the weighting
        of all variables at the same time. You can also select the variables by clicking on them in the list. Remember to
        hold <Ctrl> down while you select several variables. Choose the A / (Sdev +B) radio button. Use constants A
        = 1 and B = 0.
        We are weighting all variables by dividing them with their own standard deviations. This allows all variables
        to contribute to the model, regardless of whether they have a small or large standard deviation from the outset;
        what really counts is the systematic variation.
        Press Update and see the weights change in the list, then click OK.
        Remember to adjust the weights for both X-variables and Y-variables.
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 31
        Press Setup to launch the Cross Validation Setup dialog and choose Full Cross Validation as the cross
        validation method. Normally it is more practical to use leverage correction in the first calibration runs to detect
        outliers etc., and re-calibrate with a proper validation method (e.g. cross validation) as the last step.
        Click OK in the regression dialog when you have set all parameters. The PLS2 Regression Progress
        dialogs shows how the different segments are being made before the final model is calibrated. The prediction
        error is minimized after five PCs, but the first local minimum is 0.84 after two PCs, which we must choose to
        avoid overfitting.
        This Viewer is your gateway to your model. You can choose the most useful and common predefined result
        plots, e.g. loading weights and residuals, from the Plot menu. At later stages you can always review this model
        by using Results - Regression and selecting this results file.
        Before we continue with the interpretation, let us take a look at the warnings that were issued during the
        calibration.
        Task
        Interpret the warnings given for this model in the Warning List.
32 Quality Analysis with PCA and PLS (Tutorial B)                                           The Unscrambler Tutorials
        How to Do It
        Use Window - Warning List to display the warnings at the bottom of the screen. In this case, the warnings
        relate to the variance curves and do not indicate any outliers in the data set.
        You may also want to take a look at the actual tests that lead to these warnings by looking at the outliers list.
        Click the Outliers button in the warning window to see the outlier tests displayed in the Outlier List dialog.
        For details on how to find and identify outliers, see Tutorial C.
        Task
        Interpret the explained variance curve, which can be shown as residual variance, as it was in the PLS
        Regression Progress dialog, or as explained variance. The two different views are useful for different
        tasks.
        How to Do It
        The residual variance plot in the lower left corner is the same as you saw in the PLS Regression Progress
        dialog. We saw that a local minimum was reached with two PCs. Now we want to look at how much each of
        the six first Y-variables are described by the model. We do this by looking at the explained variance.
        Activate the lower left window. Select Plot - Variances and RMSEP and use the X- or Y-variance tab,
        where you specify the following parameters: (Lookup Image B021)
         
         Variables: Y; 1-6
         
         Samples: Validation
        And check the Total box. Press OK.
        B021 Variances and RMSEP dialog, X- B022 PLS2, Explained Validation Variance Plot
        or Y-variance sheet                 displayed for the Total model and for the six
                                            individual Y-variables
        Make sure that the plot shows the Explained Variance. If not, change it by selecting View - Source -
        Explained Variance. (Lookup Image B022)
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 33
        We concluded from the residual variance curve that two PCs were optimal. Here, we see that the variables that
        are well described are done so by two PCs. About 85% of the color variation (variables 1 and 2), and 80% of
        the variation in sweetness (variable 6) can be explained by a combination of the chemical and instrumental
        variables.
        Note that only 23% of the total Y-variance is explained by the model using two PCs.
        Task
        Interpret the score plot.
        How to Do It
        The score plot shows patterns in the samples. This is often difficult to see without some help. Use the category
        variables as markers the same way you did in Tutorial B - Interpretation of the Score Plot for the PCA
        model, using Edit - Options from the Scores plot, and selecting the relevant options in the Sample
        Grouping tab of the Options dialog.
        You see that PC 1 describes the harvesting time. Harvest time 1 is placed to the left in the plot and harvest time
        3 to the right. The score plot does not reveal information about the cultivars.
        A comparison with the loading plot gives more information. Try to interpret the two plots (Scores and
        Loadings) together.
        Task
        Interpret the loadings plot.
        Interpret the loading weights plot.
        How to Do It
        The loadings plot is located in the upper right quadrant. Activate it and select Plot - Loa ding Weights.
        (Lookup Image B023) On the General sheet, make sure you plot both X and Y, which gives you the
        loading weights for X and the loadings for Y. Plot PC1 vs. PC2 in the upper right corner.
34 Quality Analysis with PCA and PLS (Tutorial B)                                           The Unscrambler Tutorials
        General sheet                                   Loadings Plot
        Draw straight lines between the variables through the origin. Variables along the same line, far from the origin,
        may be correlated. (Negatively correlated when situated on opposite sides of the origin.) (Lookup Image
        B024)
        It seems that the spectrophotometric color measurements (L, A, and B) are strongly negatively correlated with
        color intensity and redness. Sweetness is, as expected, rather strongly negatively correlated with measured
        Acidity. But the R. Flavor shows weak correlation to the PLS-factors (near origin = low PLS loadings).
        We learned in Problem I that the jam quality varied both with respect to color, flavor, and sweetness. But the
        results so far in Problem II show that the chemical and instrumental variables mainly predict variations in color
        and sweetness (which is indicated by the low explained Y-variance of Flavor). This means that we cannot
        replace the Y-variable Flavor with the present set of X-variables. There is no information in the chemical and
        instrumental measurements we have made that are related to the Flavor content in the jam samples.
        Use of other instrumental X-variables, e.g. gas chromatographic data, could probably have increased the flavor
        prediction ability of the raspberry jam data.
        Task
        Interpret the predicted vs. measured plot.
        How to Do It
        The predicted vs. measured plot in the regression overview currently displays the results for the first Y-
        variable. (Lookup Image B025) Use Plot - Predicted vs Measured to see how the predictions are for
        other variables. Make sure to display these plots for two PCs, as this is the right number of PCs for our model.
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 35
        B025 PLS2, Predicted vs Measured Plot for variable Redness, model with two PCs
Close the results Viewer and save it with the name Tutorial B Inst-Sens.
        Task
        Make a PLS1 regression model of the relationships between sensory data and preference.
        How to Do It
        From the Editor, select Task - Regression, and specify the following parameters in the Regression dialog:
          
          Method: PLS1
          
          Samples: Calibration Sam [12]
          
          X-variables: Sensory [12]
          
          Y-variables: Preference [1]
          
          Weights: All 1/SDev in X and Y
          
          Validation method: Full Cross Validation
          
          Uncertainty test: on
          
          Number of components: 6
36 Quality Analysis with PCA and PLS (Tutorial B)                                       The Unscrambler Tutorials
        Press Weights to launch the Set Weights dialog, and weight all variables with 1/Sdev to get them in the
        same range and let them contribute equally in the modeling.
        Press OK. In the PLS1 Regression Progress dialog, we see that the residual variance seems to decrease all
        the time, which may lead us to think that we should use five or six PCs for predictions. Let us for the residual
        variance plot in the regression overview before we decide upon number of PCs to use. Click View to open the
        regression overview. (Lookup Image B026)
        Task
        Interpret the regression overview plots, which display the necessary plots to diagnose the model quickly.
        How to Do It
        We are mostly interested in how well the model can do the predictions. We therefore only comment on the
        residual variance and the Predicted vs Measured plots.
The Unscrambler Tutorials                                      Quality Analysis with PCA and PLS (Tutorial B) 37
        B027 PLS1, Residual Validation Variance Plot                                      B028 The Predicted vs Measured dialog
        Predicted vs Measured
        Activate the predicted vs. measured plot and select Plot - Predicted vs Measured. Specify the following
        parameters in the Predicted vs Measured dialog: (Lookup Image B028)
          
          Y-variable: 1
          
          Components: 2
          
          Samples: Validation
        Press OK.
        Turn on the regression line and the target line with View - Trend Lines. (Lookup Image B029)
        We see that the predictions are fairly good. Some samples are not so well predicted, but the overall correlation
        coefficient is good. The warnings issued are of no real consequence for this model.
        B029 PLS1, Predicted vs Measured Plot                                         B030 PLS1, Regression Coefficients Plot
        with trend lines
                Predicted Y
          9
                Elements:           12
                Slope:        0.838829                                   C1_H3
                Offset:       0.669368
                Correlation: 0.921301                                  C2_H3
                                                                     C1_H2
                RMSEP:        0.830774
          6
                SEP:          0.855452                       C3_H2
                Bias:        -0.139174      C2_H2       C3_H3C4_H3
                                                    C1_H1
                                           C4_H1C2_H1
                                         C4_H2
          3
                                 C3_H1
          0
                                                                         Measured Y
                   0             2            4           6          8         10
          Model, (Y-var, PC):(PREFEREN,2)
38 Quality Analysis with PCA and PLS (Tutorial B)                                                             The Unscrambler Tutorials
        Tutorial B - Interpretation of the Regression Coefficients
        The regression coefficients are used to calculate the response value from the X-measurements. The size of the
        coefficients gives an indication of which variables have an important impact on the response variables.
        There are two kinds of regression coefficients, Bw and B. The Bw coefficients are calculated from the
        weighted data table and are used for interpretation. The B coefficients are calculated from the raw data table
        and are used for predictions.
        Task
        Find which variables are important for predicting Y-variable Preference.
        How to Do It
        The estimated regression coefficients tell us the cumulative importance of each of the sensory variables to the
        consumer preference.
        Select Plot - Regression Coefficients. Double-click on the preview screen to make the plot fill the whole
        Viewer. Choose the Weighted coefficients (BW) option. Specify 2 Components before you click OK.
        (Lookup Image B030)
        Use Edit - Options to change the layout of the plot to Bars. Then, select Edit - Mark - Significant X-
        Variables Only. (Lookup Image B031)
        B031 PLS1, Regression Coefficients Plot after automatic marking of significant X-
        variables
The Unscrambler Tutorials                                      Quality Analysis with PCA and PLS (Tutorial B) 39
        Redness, Color and Sweetness are statistically significant in predicting Preference. Raspberry Smell is also
        significant, but contributing negatively to the Preference. Thickness seems to be of importance also as it has a
        large (negative) coefficient, however it is not shown significant in this model.
        Task
        Import regression coefficients into an Editor.
        How to Do It
        Select File - Import - Unscrambler Results and select Import data into New data table in the Import
        Target dialog. (Lookup Image B032) Select the file Tutorial B Sens-Pref in the Import dialog. You
        will find the file when you use File of Type: Regression.
        B032 The Import Target dialog                      B033 The Import from Regression Result
                                                           dialog
        In the Import from Regression Result dialog, mark the matrix B and select PCs: 2 in the field below the
        matrix list, and then select B0 as well in the matrix list. (Lookup Image B033)
40 Quality Analysis with PCA and PLS (Tutorial B)                                          The Unscrambler Tutorials
        Note that B may be used for prediction of new, un-weighted data, while Bw (studied above in the Regression
        Viewer) should be used with new, weighted data. Always identify important variables by studying Bw when
        the data used in the model have been weighted.
        Click OK. An Editor with the regression coefficients is launched. (Lookup Image B034) The b-coefficient s
        can then be treated as every other data in an Editor. You may plot the coefficients from the Plot menu, etc.
        B034 Editor with the imported B coefficients from the PLS1 model relating
        Preference to sensory properties
Close the Editor with the imported B-coefficients before you proceed.
        Task
        Export the regression model used to predict Preference from Sensory Data.
        How to Do It
        Select Results - Regression and find the result file Tutorial B Sens-Pref. Mark it and look at the
        information given in the lower part of the dialog. Here you see which Sample and Variable Sets were used in
        the modeling, whether you used weighting, etc. The information given here is very useful when you want to
        find a particular model at a later stage.
        Click on the Export button. This launches the Export Model dialog. (Lookup Image B035) Select Ascii-
        Mod to launch the dialog Export ASCII-MOD.
B035 The Export Model dialog B036 The Export ASCII-MOD dialog
        The optimal number of components should be used in the export. Therefore, change the number of PCs to 2
        before you click OK. (Lookup Image B036)
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 41
        Full ASCII-MOD export includes all results that are necessary to do outlier detection, etc. You may want to
        use this format if you need to use Unscrambler models outside The Unscrambler, for example in a program you
        wrote yourself. The ASCII-MOD file is readable by any ASCII editor.
        Task
        Predict the Preference for the jam samples.
        Interpret the prediction results to see whether the predictions can be trusted.
        How to Do It
        Activate the Tutor_b Editor. Select Task - Predict and specify the following parameters in the Prediction
        dialog: (Lookup Image B037)
          
          Samples: Prediction Sam 
                                   8
          
          X-variables: Sensory 
                                12
          
          Y-reference: Not included
          
          Model: Tutorial B Sens-Pref
          
          Number of Components: 2
        Click OK to perform the prediction.
        B037 The Prediction dialog                  B038 Prediction Results  Predicted with Deviation
                                                    Plot
42 Quality Analysis with PCA and PLS (Tutorial B)                                         The Unscrambler Tutorials
        Tutorial B - Interpretation of Predicted with Deviation
        No reference measurements were made for the samples in the Prediction Sam Set. This makes it impossible
        to check predicted vs. measured values. Because we have made a model based on projection, we have an
        option left: To check the reliability of the predictions from the deviations.
        Task
        Interpret the Predicted with Deviation plot.
        How to Do It
        Click View in the Progress dialog to see the predicted with deviation plot. (Lookup Image B038)
        Predicted preference for the unknown new jams have some uncertainty limits, i.e. the accuracy of new
        predictions is not so good, but this model can be used to predict the preference of new jam samples to give an
        indication of which ones will be accepted or not by customers.
        Save the results file under the name Tutorial B Predict 1.
        Task
        Plot the RMSEP.
        How to Do It
        The information you need is stored with the PLS model -Tutorial B Sens-Pref. Therefore, we have to find a
        way to look at those old results. This is done by opening a results Viewer again. Select Results -
        Regression. Mark the model and click View. The regression overview appears.
        Select Plot - Variances and RMSEP and go to the RMSE sheet. Double-click on the preview screen to fill
        the whole Viewer with the RMSE plot. (Lookup Image B039)
        B039 Variances and RMSEP                     B040 PLS1, Root Mean Square Error Plot
        dialog, RMSE sheet
The Unscrambler Tutorials                                       Quality Analysis with PCA and PLS (Tutorial B) 43
        Select only the RMSEP in the Samples section. Click OK. (Lookup Image B040)
        Now you can study the RMSEP for Preference for all PCs. RMSEP (using two PCs) is 0.83. This means that
        any predicted new sample on the scale from 1 to 9 will have a prediction error around 0.8. This is an acceptable
        error level in sensory analysis, which has much uncertainty in all measurements.
44 Quality Analysis with PCA and PLS (Tutorial B)                                         The Unscrambler Tutorials
Spectroscopy and Interference Problems (Tutorial C)
        Description of Tutorial C
        Context of Tutorial C
        We need an easy way to determine the concentration of dye (a brightly red-colored heme protein, Cytochrome-
        C), predicted variable Dye, in water solutions. Dye absorbs light in the visible range, and we want to base the
        concentration determination on this light absorbance.
        In the solutions to be analyzed there are varying, unknown amounts of milk, which absorbs some light in the
        same wavelength range as dye and therefore causes chemical interference in the measurements. In addition,
        milk contains particles that give serious light scattering.
        Another effect that will influence the absorbance spectra is the varying sample thickness.
        The Light Absorbance Spectrum figure shows the light absorbance spectrum of one sample of the
        dye/milk/water solution (Lookup Image C001). The vertical lines represent the 16 different wavelength
        channels selected as predicting variables - ( x1 , x2 , , x 16 ) for this sample.
        This example is constructed to enable duplication in a lab. This illustrates so well the interference effects and
        other effects that make spectroscopy difficult. However - similar problems occur at many industrial
        applications, eg. at measuring the concentration of different chemical species in sewer water, w hich contains
        many other chemical agents, as well as physical interferences like slurries and particles.
        The two major peaks (channels x 4 and x 6 ) represent the absorbance of dye, while the first peak ( x 2 )
        represents absorbance due to an absorbing component in the milk. The broad peak to the right (     x12 , x13 and
         x14 ) is due to light absorption by water itself.
        A problem similar to this tutorial is described extensively in chapter 8 in the book Multivariate Calibration,
        by Martens & Naes.
        Note that the known Milk and Water quantities will not be used to make the model, only as descriptors in
        result plots. The sample names are coded with these quantities as well.
Note: You will find the illustrations for this tutorial (Image C001, etc) at the end of the document.
        Task
        Open the data table and take a look at the properties of the data. Then define Sets to be used in the analyses.
        How to Do It
        Select File - Open and the file Tutor_c from the Examples directory. An Editor with the data table is
        launched.
        Go to Modify - Edit Set to define the necessary Variable and Sample Sets for later analyses in the Set
        Editor. Define the Variable Sets and Sample Sets by clicking Add and entering the intervals given here:
        Sample Sets:
         
         Name: Calibration
           Interval: 1-28
         
         Name : Prediction
           Interval: 29-42
        Click OK when you have finished defining the variables and samples sets and save the Editor before you
        continue.
        Task
        Plot some calibration samples in order to see how the spectra vary with varying amount of dye and milk.
        How to Do It
        We want to plot samples that have the same amount of milk, 10 ml. Do this by marking the samples in the
        Editor (samples 6, 14, 19, and 23). Use Edit - Select Samples and specify the sample numbers in the dialog
        (Lookup Image C002). The Selection method should be Select. Click OK and you see that the four
        samples are marked in the Editor. You could do the same by clicking the sample numbers while holding down
        the <Ctrl> key.
        Select Plot - Line and specify that you wish to use the Variable set Absorbance in the Line Plot dialog
        (Lookup Image C003).
        These four samples have the same milk level and the plot shows that the dye level has infl uence on the
        absorbance of wavelengths number 2 - 8 only.
Plot samples 20, 21, 22, and 23 the same way. These samples have the same dye level, 6 ml.
        The plot shows that increasing milk level will increase the absorbance of light of all wavelengths from number
        1 to number 16. There seems to be a great deal of interference or scattering to deal with, over the whole
        spectrum. This indicates that we may have to do some transformations of our data to get an optimal model.
Close the Viewer so that the Editor with the data is active.
        Task
        Find the best wavelength on which to make a univariate regression model.
        How to Do It
        You find the best wavelength by looking at the correlation between each absorbance variable and the Dye level
        variable. Activate the Tutor_c Editor. Select Task - Statistics and specify the following parameters in the
        Statistics dialog (Lookup Image C005).
         
         Samples: Calibration [28]
         
         Variables: Statistical [17]
        Click Close instead of View and save the result file with the name Tutorial C Statistics. We are going to
        import the correlation matrix from the result file into an Editor instead.
        Select File - Import - Unscrambler Results. Specify New data table in the Import Target dialog to
        avoid overwriting the data table in the Editor.
        In the Import dialog, change the Files of type to Statistics and select Tutorial C Statistics before you click
        Import to launch the Import from Statistics Result dialog (Lookup Image C006). The matrix where the
        correlation results are stored is called StatCorr and you should import Group 1.
        The variable with the highest correlation coefficient to Dye Level is Xvar6 with a correlation coefficient of
        0.49. Close the Editor with the correlation matrix; you do not need to save the Editor. The values in the Editor
        are the correlation coefficients between the variables.
        Now we should illustrate the regression in a plot. To get the right plot we have to copy Xvar6 to a variable left
        of Dye Level. Mark the Xvar6 variable in the Tutor_c Editor. Then, click and hold the <Ctrl> key as you click
        inside the marked column and drag the Xvar6 until the Dye Level variable is framed. Release the mouse button
        and the Xvar6 is copied (Lookup Image C007).
        Mark the two variables and select Plot - 2D Scatter. Remember to plot only the calibration samples
        (Lookup Image C008).
        Turn on the Regression Line and Target Line with View - Trend Lines, if they are not tuned on by
        defaut.. Hopefully we can do better with multivariate regression models. Close the Viewer after you have
        studied the plot. Mark the copied variable in the Editor (column 3) and delete it.
        Tutorial C - Calibration
        We choose to make a PLS regression model because PLS takes the variation in Y into consideration when the
        model is calibrated.
        Task
        Make a PLS regression model between the variable set Absorbance (X) and the variable set Dye Level(Y).
        How to Do It
        Activate the Tutor_c Editor and select Task - Regression. In the Regression dialog, specify the following
        parameters:
         
         Method: PLS1
         
         Samples: Calibration [28]
         
         X-variables: Absorbance [16]
         
         Y-variables: Dye Level [1]
         
         Weights: All 1.0 in X and Y
         
         Validation method: Leverage Correction
         
         Num PCs: 10
        Start the calibration by clicking OK.
        Task
        Find an outlier by looking at warnings and plots.
        How to Do It
        Click View to enter the Regression Overview plot. This shows the most important regression results, but
        we are more interested in the warning list. Select Scores plot by clicking on it. (Lookup Image C009)
        Select Window - Warning List if the warning list is not visible. . A dockable view appears with all warnings
        listed (Lookup Image C010). The first warnings indicate that some samples are outliers. Look for further
        information in the outlier list by clicking the outliers button.
Sample 8 is listed frequently. Investigate that sample further by plotting the raw data table.
        Activate the Tutor_c Editor, mark samples 7, 8, 9, and 10. Select Plot - Line and use the Variable set
        Absorbance (Lookup Image C011).
        It is obvious that the pattern in sample 8 is typical. Samples that are very different from others may distort the
        model so much that it becomes useless for future use. So this sample should not be included in the calibration
        samples used to make the model.
        The detection of outliers and the way you should treat them is an important, but difficult task. It makes no
        sense to interpret the model as long as outliers are present. Close the Viewer with the line plot and save the
        result file with the name Tutorial C. Now you should make a new model without the known outliers.
        Task
        Make a new PLS1 model with the same parameters as before, but with sample 8 kept out of the calculations.
        How to Do It
        Activate the Tutor_c Editor and select Task - Regression. In the Regression dialog, specify the following
        parameters:
         
         Method: PLS1
        Go to the Samples sheet and click Select next to the Keep Out of Calculation field. An Editor pops up
        where you can mark the samples that should not be used in the calibration. Mark sample 8. Several samples are
        marked by holding down <Ctrl> while clicking on the samples to be marked. If you mark some other samples,
        you may deselect them by holding down <Ctrl> while you click on the undesired samples. Click OK and the
        sample 8 is inserted in the Keep Out of Calculation field.
        There are still some warnings issued, but they do not make any real harm to the model. We go on and proceed
        to look at other modeling results. Do not click View yet!
        Task
        Study the residual variance in the model.
        How to Do It
        We want to study the prediction error in the screen output.
        (Lookup Image C012). The horizontal bars in the PLS1 Regression Progress dialog indicate the residual
        variance after each PC. The first bar, in PC 0, represents the total variance. The second bar is about 10%
        smaller, meaning that PC no 1 explains about 10% of the total variance. After PC no 2 about 2/3 of the total
        variance has been explained. The numerical value of the residual Y-variance is shown, too.
        The variation in the calibration samples cannot be described significantly better with any new PC after the five
        first PCs. Very little more variance is explained by PC 8, but we still have not explained all of the variance.
        After 8 PCs, observe how the prediction variance now increases slightly again, due to overfitting and noise
        modeling.
        The minimum estimated residual variance is less in this run than in the previous run: now 1.1 compared to 2.2
        in the first model. It seems that seven PCs will give the optimal model. Eight PCs give a smaller variance, but
        the difference is too small to motivate the use of more PCs.
        If the model has successfully described systematic variation, we start to interpret diff erent additional modeling
        results. The most important model results to study then, is the Scores, Loadings, and the Predicted vs
        Measured.
        Task
        Interpret the plots in the regression overview.
        How to Do It
        The regression overview was launched when you clicked View (Lookup Image C013). It consists of four
        plots of the most important modeling results from the regression model. Save the results file under the name
        Tutorial C No Outliers before you continue.
        The plot in the lower left corner is the residual variance. This is the same results as you saw in the regression
        progress dialog while the model was being calibrated. We do not comment further on this plot.
        Score Plot
        The plot in the upper left corner is the Scores plot. From the Scores plot we can interpret that, the combination
        of two main PCs, PC 1 and PC 2, reflects the variations in the milk and water levels. The milk level increases
        from upper left to lower right in the plot, while the water level increases from right to left.
        Select Edit - Options and go to the Sample Grouping sheet, where you check Enable Sample Grouping .
        Go to Markers Layout and select Value of Variable where you specify Y-variables 1. The score plot now
        reveals a clear pattern from lower left to upper right in the plot. You would see the same information in a 2D
        scatter loading plot.
        Regression Coefficients
        The plot in the upper right corner displays a regression coefficients line plot instead of a 2D scatter plot of
        loadings and loadings weights (which is default), when models are made from data other than spectral. This
        happens because we changed the Data Type to Spectra (see section  Read Data File and Define Sets). It is
        easier to interpret the regression coefficients plots than loading and loading weights plots when the variables
        are functions of another implicit variable, such as wavelength or time.
        Use Edit - Options to change the plot layout to bars instead of a curve.
The regression coefficients plot summarizes the relationship between all predictors and a given response.
        Task
        Take a closer look at the residual variances in the error measures plots.
        How to Do It
        Activate the Predicted vs Measured plot and select Plot - Variances and RMSEP and go to the X- and
        Y-variance sheet, where you specify the following parameters: (Lookup Image C014)
         
         Variables: Remove the number in the X: and Y: boxes. Only the total variances should be plotted
         
         Samples: Both Calibration and Validation
        Change the variance from residual to explained by selecting View - Source - Explained Variance
        (Lookup Image C015). The upper plot shows that the model describes much of the variance in the X -
        variables in the first PCs, while it takes longer time in the lower plot to describe the variance in Y (d ye level).
        We are interested in describing Y, therefore we have to include enough PCs in our model to get a high
        explained variance for the Y-variable.
        Note that the model results are available to you as predefined plots in the Plot menu when you have a result
        Viewer active. Activate the Tutor_c Editor and see that the Plot menu changes to general plot options.
        Sometimes you close the result Viewer by accident. You can then get the predefined plots back by selecting
        Results - Regression and opening the Tutorial C No Outliers result file with the View button. The result
        Viewer with the regression overview is launched.
        Task
        Correct the data for multiplicative scatter effects. Omit variables 1 to 8 in the Set Absorbance as important
        variables.
        First, we verify the need for MSC by looking at the Scatter Effects plot. This plot is available from a
        Statistics model. Select Task - Statistics and specify the following parameters in the Statistics dialog:
         Samples: Calibration [28]
         
         
         Keep out of calculation (samples): 8
         Variables: Absorbance [16]
         
        Click OK to make the model and click View in the Progress dialog when it is finished. Then use Plot -
        Statistics and select the Scatter sheet. Click the All button and then OK to make the plot.
        The plot has to be scaled so the origin is shown. Do this with View - Scaling - Min/Max and enter 0 in both
        From fields (Lookup Image C016). Click OK.
        The regression lines intercept roughly in the origin, which indicates no need to correct for offset (Lookup
        Image C017). But the regression lines have different slopes, which calls for MSC using common
        amplification.
        Close the Viewer before you continue.
        Select Modify - Transform - MSC. Specify the following parameters in the Multiplicative Scatter
        Correction dialog: (Lookup Image C018)
         
         Samples: Calibration [28]
         
         Variables: Absorbance [16]
         
         Function: Common Amplification
         
         Test Samples: 8
         
         Omit Important Variables: 1-7
        Test samples are not used to find the correction factors we want to find now and use in the MSC. Sample 8 is
        an outlier and will give slightly inferior results if it is used. We therefore include it in the Test Samples to
        avoid that it is used.
        Variables 1-8 are omitted as important because the light absorption of these variables vary with the dye level,
        while wavelengths 9 to 16 (the water absorption peak) is independent of the concentration of dye. The
        difference in these wavelengths is instead caused by the general light-scatter due to milk addition. It is
        important that only wavelengths with no chemical information is used to find the correction factors.
        Save the MSC model with the name Tutorial C MSC Model. Save the corrected data now displayed in the
        Editor with the name Tutorial C MSCorrected.
        Look at the corrected data by launching a general Viewer (Results - General View ) and selecting Plot -
        Line. Select the data file you just saved with the corrected data in the Line Plot dialog. Plot Samples 20 -
        23 using the Variables Set Absorbance. (Lookup Image C019)
        Now we are going to plot the original data, but this time not from the original data file. Select Plot - Line and
        find the result file Tutorial C No Outliers (Lookup Image C021). You see that the raw data from which the
        model is made is saved together with the model matrices. This time you do not have to specify a Variable Set
        because the raw data used for the model is only the X-variables from that Set.
        Plot samples 20 - 23 from Xraw (Lookup Image C022). You see that the MSCorrected data are different
        from the original. The interference and light scatter effects have successfully been corrected for.
        Task
        Make a PLS1 model with the same model parameters as the model Tutorial C No Outliers.
        How to Do It
        Activate the Editor with the corrected data. Select Task - Regression and specify the following parameters
        in the Regression dialog:
         
         Method: PLS1
         
         Samples: Calibration [28]
         
         X-variables: Absorbance [16]
         
         Y-variables: Dye Level [1]
         
         Weights: All 1.0 in X and Y
         
         Validation Method: Leverage Correction
         
         Num PCs: 10
        Click OK to make the model. See how the residual variance decreases faster for each PC in this model
        compared to the previous models. The MSCorrection has improved the model.
        Click Close and save the model under the name Tutorial C MSCorrected.
        How to Do It
        Select Results - General View and then Plot - Line. Click the Browse button against Source and find the
        result file Tutorial C. The matrix we are interested in is called ResYValTot (Lookup Image C023).
        Select Edit - Add Plot and plot the same matrix for the model Tutorial C No Outliers and Tutorial C
        MSCorrected.
        The plot shows the validated residual Y-variance for the three models (Lookup Image C024). From this plot
        we find that the minimum square error is approximately:
         
         for Tutorial C MSCorrected using 6 PCs,
         
         for Tutorial C No Outliers using 8 PCs
         
         for Tutorial C using 3 PCs
        Tutorial C MSCorrected with six PCs gives the lowest estimate for the residual Y-variance. Predictions done
        by this model using six PCs therefore give the predictions with the lowest prediction error.
        Note again how the Results menu is your way to look at results from older models.
        Task
        Finally, let us see how larger the error in ml dye we have to expect in future predictions; Root Mean Square
        Error of Prediction.
        How to Do It
        Activate the regression overview Viewer. Select Plot - Variance and RMSEP and go to the RMSE sheet.
        Double-click the screen preview to display the plot in the whole Viewer. De-select the calibration samples box
        and tick the validation samples (RMSEP) instead (Lookup Image C025).
        You see that the shape of the curve is exactly that of the residual variance, but the values have changed. The
        plot says that predictions done with this model and using six PCs will have an average prediction error of 0.98.
        Task
        MSCorrect the prediction samples.
        How to Do It
        Go back to your data table Tutorial C MSCorrected containing the MSCorrected training samples. In order to
        correct the prediction samples, use Modify - Transform - MSC. Specify the following parameters in the
        Multiplicative Scatter Correction dialog: (Lookup Image C026)
         
         Samples: Prediction [14]
         
         Variables: Absorbance [16]
         
         Use Existing MSC Model: Tutorial C MSC Model
        Click OK and save the MSC coefficients. The prediction samples are then changed according to the
        MSCorrection you found previously.
        Task
        Predict the dye level of these samples.
        How to Do It
        Select Task - Predict. Specify the following parameters in the Prediction dialog: (Lookup Image C027)
         
         Samples: Prediction [14]
         
         X-variables: Absorbance [16]
         
         Y-reference: None
         
         Model name: Tutorial C MSCorrected
         
         Number of Components: 6
        Click View after the prediction is done. The prediction overview plot appears where the predicted values is
        shown together with the deviations (Lookup Image C028). Large deviations indicate that the predictions
        cannot be trusted.
         
         Read Data: File - Open or File - Import. You can import data from many instruments - directly or via
           e.g. JCAMP-DX or ASCII. Many instruments also write U5 data files or Unsc-ASCII data files.
         
         View and Prepare Data: Look at the Editor, define sets. Select some samples and Plot - Line or Matrix
           to get an overview of the spectra (data plot). Histograms of Y-variables ARE useful too, as well as 3D
           scatter plots of constituents if there are several.
         
         Pre-process: Modify - Transform allows you to do spectroscopic transformations, derivation,
           smoothing, etc. Modify - Reduce (Average) may be useful too. If you have a data plot of your spectra
           open, you will see how the spectra change on the fly.
         
         Statistics: Task - Statistics may be useful. The Statistics plot Scatter reveals scatter problems.
         
         Select Samples: If you need to throw away data to get a more balanced data set you may make a PCA of
           the spectra or the constituents. From the Score plot, use Edit - Mark and mark samples that span all the
           important components (samples far away from the origin, but not extremes.) Select Task - Extract
           Marked and save as a new file.
         
         Reduce Spectra: If you need to use fewer wavelengths, or perhaps only a range of the spectra, select
           Modify - Edit Set - Add - Special intervals - Select Every n variables - Update. You can now
           change the starting point in the Interval field. Click OK twice to save the Set, e.g. under the Name New
           Set. Then, choose Edit - Select Variables - Set and select the NewSet. The marked variables can now
           be deleted, and you can save the new data file under a new name.
         
         Make First Calibration Model and Look for Outliers: Task - Regression - PLS2 gives a nice
           overview if you have several constituents. Otherwise use PLS1. View the results, especially Variance,
           Scores and Predicted vs Measured. Plotting results, use Edit - Mark (also available under right mouse
           button) to mark suspicious samples in the score plots. Plot - Sample outliers and XY Relation outliers
           are useful to investigate them. You will see that the samples are marked in those plots too (and all other
           sample based plots).
         
         View - Raw data produces a link to the raw data table, high-lighting the marked samples - or vice versa!
           Mark in the raw data table and see them marked in the corresponding plots.
         
         Refine the Model: Task - Recalculate without Marked, gives a new model with the marked samples
           removed. Compare results, and look for more outliers. Repeat if necessary.
         
         Study the Model in Detail: Plot - Variances and RMSEP - RMSE/Important variables/Predicted
           versus measured are useful tools. View - Trend lines - Regression line and View- Plot
           statistics are useful too. Scores plots using Edit - Options - Sample grouping (also under right
           mouse button) is excellent for investigating patterns.
         
         Delete Wavelengths: From the Important variables plot you can Edit - Mark ranges in spectra that are
           not important (potentially noisy). Task - Recalculate without Marked gives you a new model based
           on fewer wavelengths, that is possibly more rugged and with a smaller prediction error.
         
         Validation: Before you finish, make sure the model is properly validated using a suitable cross validation
           or test set. Always keep replicates of the same samples in the same segment.
         
         Additional Tools: Statistics on the B-vector is helpful to determine the number of PCs. Use Results -
           General view - Plot - Line to plot the B-diagnosis for the model (Statistics, vector 1-6). Vector no 4 and
           5 are especially useful (Bsum and SquSum B).
         
         File - Import - Unscrambler results lets you see the numerical values of all results, e.g. B (the
           regression coefficients) or ExtraVal, which contains information about the need for slope and bias
           adjustment. Use Help for details.
        A standard method for the synthesis of enamine from a ketone gave some problems, and a modified procedure
        was investigated. A first series of experiments gave two important results:
          
          A new procedure was built up, which shortened reaction time considerably;
          
          It was shown that the optimal operational conditions were highly dependent on the structure of the original
            ketone.
        Thus, a new investigation had to be conducted to study the specific case of the formation of morpholine
        enamine from methyl isobutyl ketone. It was decided to adopt a 2-step strategy:
          
          First, at a screening stage, study the main effects of 4 factors (relative amounts of the reagents, stirring rate
            and reaction temperature) and their possible interactions;
          
          Then, conduct an optimization investigation with a reduced number of factors.
        Task
        Select a screening design which requires a maximum of 11 experiments that will make it possible to estimate
        all main effects and detect the existence of 2-factor interactions.
        Note: With 4 design variables, you need a fractional factorial design to keep the number of experiments lower
                  4
        than 16 (2 ).
        How to Do It
        Choose File - New Design to launch the Design Wizard, where you can generate a designed data table.
        In the Design Wizard - Select Method to Use dialog, choose to build the design From Scratch and Click
        Next.
        This launches the Design Wizard - Select Design Type dialog, where you select Create Fractional
        Factorial Design and proceed by clicking Next.
        Do this by clicking the New button. This launches the Add Design Variable dialog (Lookup Image
        D001), where you must enter the name of the new variable (e.g. TiCl4, Morpholine, Temperature and
        Stirring), select Continuous, and enter the low and high levels as stated above. Validate by clicking OK and
        enter the next variable by Clicking New again.
        Note: In order to be allowed to specify center samples, you will have to define Stirring rate as a continuous
        variable; you can give it the arbitrary levels -1 and 1, where -1 stands for no stirring and 1 stands for high
        stirring.
        Click Next to launch the Design Details dialog. Keep Number of Replicates to 1, and add 3 Center
        Samples (Lookup Image D002).
        Once you are satisfied with your design specifications, click Finish to exit. The generated design is
        automatically displayed on screen (Lookup Image D003).
        You can use the View menu to toggle between display options. Try Sample Names and Point Names,
        Standard Sample Sequence and Experiment Sample Sequence (randomized order).
        It should now be safe to store your new data table into a file, using File - Save As; give it a name, e.g. Enam
        FRD. Note that you should not overwrite the existing file Enam_frd. You will need this file later in the
        tutorial.
        Task
        Run an Analysis of Effects.
        How to Do It
        First, you should enter the response values. Since this has already been done, you just need to read the
        complete file. Use File - Open, and select from the Designed Data list in the Open File dialog the file
        named Enam_frd, which already contains the response values.
        Task
        Interpret the results of the Analysis of Effects that you have just run.
        How to Do It
        The Effects Overview plot shows which effects are significant (Lookup Image D005). By default, the
        Significance Testing Method is Center.
        Select Plot - Effects and choose COSCIND as Significance Testing Method on the Overview sheet in the
        Effects dialog. Click OK to display the new plot. (Lookup Image D006)
        You can see that three effects are considered to be significant: Main effect TiCl4 (A), Interaction AB or CD,
        and Main effect Morpholine (B).
        Select Window - Copy to - 2 : this copies the Effects Overview plot into sub-view 2 (the upper sub-view in
        a system of two). Activate the lower sub-view (which is currently empty), and use Plot - Effects. On the
        Response Details sheet (Lookup Image D007), select Normal Probability in the Plot type field, and
        remove the option Include Table.
        The normal probability plot of the effects (Lookup Image D008) confirms the results of the Effects
        Overview: the effect of Morpholine (B) is clearly very significant, and AB=CD and TiCl4 (A) are also likely
        to be significant.
        Task
        Check the data for non-linearities.
        Click in the lower plot and use Plot - Statistics. On the Compressed sheet (Lookup Image D010), go to
        the Sample Groups field, where you specify that you wish to plot groups containing Design and Center
        samples. Validate your choices with OK.
        The lower plot (Lookup Image D011) now displays the mean and standard deviation of all Design samples
        compared to that of the Center samples only.
        You can see that the standard deviation for the center samples is about half the overall standard deviation. This
        indicates some lack of reproducibility in the center samples; this is why most of the effects observed in the
        Analysis of Effects were not found significant according to the Center Significance testing method. If you go
        back to the Editor and study the Yield values, you will notice that center sample Cent-c has a very different
        value from Cent-a and -b; maybe that experiment was not performed correctly.
        The other important information conveyed by the plot is that there is a strong non-linearity in the actual
        relationship between Yield and the design variables: The mean value for the center samples is much higher
        than for the overall design.
        How to Do It
        Choose File - New Design to launch the Design Wizard, where you will be able to generate a designed
        data table. In the Select Method to Use dialog, choose to build the design from scratch and Click Next.
        This launches the Select Design Type dialog, where you select Optimization designs: Central
        Composite and validate by Clicking Next.
        In the Define Design Variables dialog, you will specify the variables TiCl4 and Morpholine with the same
        ranges of variation as before (resp. 0.6  0.9 and 3.7  7.3), as follows:
        Click New to launch the Add Design Variable dialog, where you must enter the name of the new variable,
        select Continuous, and enter the Low and High levels. Validate by Clicking OK.
        Enter the next variable by Clicking New again.
        When both variables have been defined, check that the Define Design Variables dialog indicates the correct
        Star Points Distance from Center, namely 1.41.
        After all design variables have been defined, click Next to enter the Define Non-design Variables dialog,
        where you click New to define the non-designed response variable Yield in the Add Non-designed
        Variable dialog.
        Once you are satisfied with your variable definitions, use Next to get into the Design Details dialog, where
        you set the Number of Replicates to 1 and the Number of Center Samples to 5.
        You need not make any further specification in the next dialog, Randomization Details. Click Next again
        to launch the Last Checks dialog, where you make sure that all your design parameters have the correct
        values. The design should include a total of 13 experiments. Otherwise, use Back to go to the appropriate
        dialog and make the necessary corrections.
        Once you are satisfied with your design specifications, use Finish to exit. The generated design is
        automatically displayed on screen. Save your design for further use, e.g. with the name Enam CCD.
        Task
        Run a Response Surface Analysis.
        How to Do It
        Normally, you would first have to enter the response values, but this has already been done. From the
        Designed Data list in the Open File dialog, open the file named Enam_ccd, which already contains the
        response values.
        Choose Task - Response Surface. In the Response Surface dialog (Lookup Image D012), make the
        following selections:
          
          Samples: Default
          
          X-var: Design Vars + Int + Squ (2+3)
          
          Y-variables: Default
        When the computations are done, click View to study the results. Do not forget to save the file before you start
        interpreting the results!
        Task
        Interpret the results from the Response Surface Analysis.
        How to Do It
        The viewer displays a Response Surface Overview, which consists of 4 plots (Lookup Image D013):
        Analysis of Variance, Residuals, Response Surface visualized as a contour plot, and Response Surface
        visualized as a landscape plot.
        First, study the ANOVA results. Use Window - Copy To - 1 to copy the upper left plot to sub-view 1
        (which covers the whole Viewer window). You can adjust the width of the various columns of the table if
        necessary (Lookup Image D014). Study in turn: Summary, Model Check, Variables, and Lack of Fit.
          
          The Summary shows that the model is globally significant, so we can go on with the interpretation.
          
          The Model Check indicates that the quadratic part of the model is significant, which shows that the
            interaction and square effects included in the model are useful.
          
          The ANOVA table for variables displays the values of the b -coefficients, and their significance. You see
            that the most significant coefficients are for the linear and quadratic effects of Morpholine; the quadratic
            effect of TiCl4 is close to the 0.05 significance level. That section of the table also tells you that the
            maximum point is reached for TiCl4=0.835 and Morpholine= 6.504; the information displayed on top of
            the table shows a Predicted Max Point Value of 96.747.
        Task
        Check the residuals from the Response Surface Analysis.
        How to Do It
        The upper right sub-view (if necessary, use Window - Go To - 5) in the Response Surface Overview plot
        shows a Normal Probability plot of the residuals. This plot can be used to detect any outliers. Here, you see
        that the residuals form two groups (positive residuals and negative ones). Apart from that, they lie roughly
        along a straight line, and no extreme residual is to be found outside that line. This means that there is no
        apparent outlier.
        From that window, go to Plot - Residuals and select Y-Residuals vs Predicted Y on the General sheet
        (Lookup Image D015). Try alternatively the two options Residuals (which shows the raw residuals) and
        Studentized (which shows transformed residuals that can be compared to a Student distribution).
        In the Studentized residuals plot (Lookup Image D016), all values are within the (-2;+2) range, which
        confirms that there are no outliers. Furthermore, there is no clear pattern in the residuals, so nothing seems to
        be wrong with the model.
        Select Plot - Predicted vs Measured and choose Predicted vs Measured. If necessary, use View -
        Trend Lines - Regression Line to display the regression line (blue), and View - Trend Lines - Target
        Line to visualize the y=x line (black) (Lookup Image D017).
        You can see how the design samples are spread around the regression line; in particular, the Center samples to
        the right of the plot show an important spread. This is why so few effects in the model are very significant:
        There is quite a large amount of experimental variability.
        Task
        Interpret the response surface plots.
        How to Do It
        The landscape plot displayed in the lower right quadrant shows you the shape of the response surface: a kind of
        round hill with a maximum somewhere between the center and maximum values of the design variables.
        That plot is not precise enough to spot the coordinates of the maximum; the contour plot displayed left
        (Lookup Image D018) is better suited for that purpose. For instance, you can change the scaling to zoom
        around the optimum, so as to locate its coordinates more accurately. Check that they match what is displayed
        in the ANOVA table.
        Finally, you may also have noticed that the Predicted Max Point Value is smaller than several of the actually
        observed Yield values (sample Cube004a for instance has a Yield of 98.7). This is not paradoxical, since the
        model smoothes the observed values; those high observed values might not be reproduced if you performed the
        same experiments again.
        Since there was no apparent lack of fit, no outliers, and the residuals showed no clear pattern, the model could
        be considered valid and its results interpreted more thoroughly.
        The response surface showed an optimum predicted Yield of 96.747 for TiCl4=0.835 and Morpholine= 6.504;
        the predicted Yield is larger than 95 in the neighboring area, so that even small deviations from the optimal
        settings of the two variables will give quite acceptable results.
        The training samples are divided into three Sample Sets, each containing 25 samples. The three Sets are:
        Setosa, Versicolor, and Virginica. The Sample Set Testing will later be used to test the classification.
        Four variables are measured; Sepal length, Sepal width, Petal length, and Petal width. The measurements
        are given in centimeters.
Note: You will find the illustrations for this tutorial (Image E001, etc) at the end of the document.
        Task
        Insert a category variable into the Tutor_e data table.
        Enter the right type for each of the 75 test samples. A simple way to do this is as follows:
        Click on the first cell containing m. From the keyboard, type in m (which activates the entry mode on the
        cell) then v (initial of Versicolor), followed by <Enter>. You are now positioned in the next cell; apply the
        same procedure, until you reach the first Setosa sample. There, type in m and s followed by <Enter>. Go
        on like this, until you reach the first Virginica sample. There, type in m, v and v (we need to type in
        v twice to activate the second level which has v as initial).
        Save the data table once you have completed this task.
        Task
        Make a PCA model of all calibration samples.
        How to Do It
        Use Task - PCA and select the following parameters:
          
          Samples: Training
          
          Variables: Measurements
          
          Weights: 1/SDev
          
          Validation Method: Leverage correction
          
          Number of PCs: 4
        We assume that you are familiar with making models by now. Refer to one of the previous tutorials if you have
        trouble finding your way in the PCA dialog.
        You see that there are few outlier warnings and most of the variance is explained by three PCs. Click View to
        look at the modeling results.
        Activate the score plot and select Edit - Options. Enable sample grouping and select Value of Variable in
        the Group By field. Make sure Leveled Variable 1 is selected. Click OK . (Lookup Image E003) You can
        see the three groups in different colors; one very distinct (Setosa) and two that are not so well separated
        (Versicolor and Virginica). This indicates that it may be difficult to differentiate Versicolor from Virginica.
        Task
        Make PCA models for the three classes Setosa, Versicolor, and Virginica.
        How to Do It
        Go back to the Editor window containing your re-formatted data table. Select Task - PCA and make a model
        with the following parameters:
         
         Samples: Setosa
         
         Variables: Measurements
         
         Weights: 1/SDev
         
         Validation: Leverage correction
         
         Number of PCs: 4
        When the model is computed, close the PCA Progress dialog and save the class model with name Setosa.
        Repeat the procedure successively on Sample Sets Versicolor and Virginica, also saving each new PCA
        model.
        Task
        Assign the Sample Set Testing to the classes Setosa, Versicolor, and Virginica.
        How to Do It
        Select Task - Classify. Use the following parameters: (Lookup Image E004)
        Make sure that Centered Models is checked. Add the three models Setosa, Versicolor, and Virginica.
        The suggested number of PCs to use is 3 for all models; keep that default (it is based on the variance curve for
        each model). If you are curious, you may select a model in the list and click Variance to display the
        calibration and validation variances for that model.
        Click OK to start the classification.
        Task
        Interpret the classification results displayed in a table plot.
        How to Do It
        Click View when the classification is finished. (Lookup Image E005)
        A table plot is displayed, called Classification Table. There are three columns: one for each class model.
        Samples recognized as members of a class (they are within the limits on sample-to-model distance and
        leverage) have a star * in the corresponding column.
        The significance level can be toggled with the Significance option, which is available as a drop-down menu
                   from the menu bar.
        At the 5% significance level, we can see that all but three samples (false negatives: virg1,virg36,virg42) are
        recognized by their rightful class model.
        However, some samples are classified as belonging to two classes (false positives): 12 Versicolor samples are
        also classified as Virginica, while 6 Virginica samples are also classified as Versicolor. Only the Setosa
        samples are 100% correctly classified (no false positives, no false negatives).
        If you tune up the significance limit to 25%, this reduces the number of false positives but also increases the
        number of false negativse (vers41 and Virg35 come in addition).
        How to Do It
        Select Plot - Classification and choose the Coomans plot for models Virginica and Versicolor. (Lookup
        Image E006)
        This plot displays the sample-to-model distance for each sample to two models. The newly classified samples
        (from sample set Testing) are displayed in green color, while the calibration samples for the two models are
        displayed in blue and red. (Lookup Image E007)
        The Coomans plot for the classes Virginica and Versicolor shows that all Setosa samples are far away from
        the Virginica model (they appear far to the right). However, we can see that many Virginica and Versicolor
        samples are within the distance limits for both models. This suggests some classification problems.
        Task
        Look at the Si vs. Hi plots.
        How to Do It
        Select Plot - Classification and choose Si vs. Hi for model Versicolor. Before you start interpreting the plot,
        turn on Sample Grouping in the Options dialog and choose Name as Markers Layout, with length 2 (tick
        only the first two boxes in the Name field). (Lookup Image E008) The plot is much easier to interpret: iris
        type appears clearly with the initials Se, Ve, Vi in three different colors.
        Some Virginica samples are classified as belonging to the class Versicolor, but most samples that are not
        Versicolor are outside the lower left quadrant. The reason for the difficult classification between Versicolor
        and Virginica is that the samples are overlapping in the score plot. They are very similar with respect to the
        width and length of the sepal and petal.
        Task
        Look at the Model Distance plots.
        This plot allows you to compare different models. A distance larger than three indicates good class separation.
        The models are different.
        It is clear from this plot that the Setosa model is different from the Versicolor, while the distance to Virginica
        is smaller.
        Task
        Look at the Discrimination Power plots.
        How to Do It
        Select Plot - Classification and choose the Discrimination Power for Versicolor projected onto the
        Setosa model.
        This plot tells which of the variables that are most useful in describing the difference between the two types of
        iris. (Lookup Image E010) We can see that variables Sepal Length and Sepal Width have high
        discrimination powers (7.5  8) while it is lower for Petal length and Petal Width (4.5  5).
        Do the same for Versicolor onto Virginica: all variables have discrimination powers lower than 5. This is
        obviously not enough.
        Task
        Look at the Modeling Power plots.
        How to Do It
        Select Plot - Classification and choose the Modeling Power for Versicolor.
        Variables with a modeling power near one are important for the model. A rule of thumb says that variables
        with modeling power less than 0.3 are of little importance for the model.
        The plot tells us that all variables have a modeling power larger than 0.3, which means that all variables are
        important for describing the model. None of the variables should be deleted from the modeling. The only
        chance to improve on the classification between Versicolor and Virginica is to measure some additional
        variables.
        In this tutorial we show you some of the capabilities The Unscrambler has to interact with other programs
        under the Windows operating system. The main focus here is how The Unscrambler is used in conjunction
        with other software.
The water content of wheat samples was measured and is the response variable in the data.
Note: You will find the illustrations for this tutorial (Image F001, etc) at the end of the document.
        Task
        Import the ASCII file Tutor_F.txt.
        This launches the Import ASCII dialog, where you specify what the ASCII file looks like (Lookup Image
        F001). Use the options displayed in the dialog. Note that the first row in the data file contains variable names
        and the first column contains sample names.
Click OK to import the file and the data are read into an Editor.
        Task
        Import the data file Tutor_F.xls from Excel.
        How to Do It
        There are two procedures. Use Procedure I if you have Excel or Lotus installed on your Personal Computer or
        Procedure II if you do not have a spreadsheet program that can read the file Tutor_F.xls. You only need to
        follow one of the procedures.
        You are now going to drag the selected data area to the first variable in the Editor in The Unscrambler. Hold
        down <Ctrl> and click on one of the sides of the marked area; the cursor changes and you see a + sign on top
        of the cursor. Drag the data from Excel to the Editor in The Unscrambler that contains the wheat data. Note
        how a frame marks the data area that is covered by the data you copy. Let go of the left mouse button when
        you see that the frame covers the first variable completely, i.e. from sample 1 and down.
        The dialog Select Drop Method appears (Lookup Image F002). Select Insert as 1 new column. Import
        the sample and variable names from Excel the same way.
        Find the file Tutor_F.xls in the Import dialog and Click Import. This launches the Import Worksheet
        dialog, where you specify the options (Lookup Image F004). The Excel file is prepared by defining Range
        Select Water Content for Range names against Data and specify A2:A56 in the Sheet range Delete the
        entries A1:A1 in the sheet range for Sample names when you import data without names. In the Sheet
        range field against Variable names specify A1:A1. Then click OK.
        Task
        Insert a category variable to group the samples into three categories, depending on the water content level.
        How to Do It
        Place the cursor in the first column and select Edit - Insert - Category variable. This launches the
        Category Variable Wizard - Enter Variable Name and Choosing Method dialog (Lookup Image
        F005). Enter a name for the variable in the first dialog, select I want to specify the levels manually under
        Method and Click Next to enter the next dialog, where you specify the levels. Add three levels: Low (Water <
        13.0), Medium (13.0 > Water >15.0), and High (15.0 > Water).
        Enter the category values according to the distribution above. Double click the category variable cell and select
        the drop-down list. A list of the valid levels is displayed. A faster way to enter the value is to double-click the
        cell and Click the first character of the desired level. Click the character repeatedly if many levels begin with
        the same character.
        The name of the category variable is written in blue text to distinguish this kind of variable from the ordinary
        ones.
        Task
        Define the Variable Sets NIR Spectra and Water Content . Change the data table properties to Spectra.
        How to Do It
        In the Editor, mark variable number two which now contains the water content of the wheat samples. Select
        Modify - Edit Set and make sure that Variable Sets is selected. Click Add to define the Set Water
        Content from current Editor. Define another Variable Set NIR Spectra using variables 3  22. Change the
        Date Type to Spectra for both. We do not need to define a Sample Set because All Samples is automatically
        defined as a Set.
        Task
        Make a PLS1 model from NIR spectra to the Water Content.
        How to Do It
        Select Task - Regression and specify the following parameters in the Regression dialog:
          
          Method: PLS1
          
          Samples: All Samples [55]
          
          X-variables: NIR Spectra [20]
          
          Y-variables: Water Content [1]
          
          Weights: All 1.0
          
          Validation method: Leverage Correction
          
          Number of components: 5
You see how the model describes more and more of the water content.
        Task
        Look at the model results.
        How to Do It
        Click View in the dialog when the model is made. The following plot appears: (Lookup Image F006)
        The residual Y-variance goes down nicely and is close to 0 after two PCs. The Predicted vs Measured plot
        looks OK. The fit is quite good. From the regression Coefficients we see that there is a distinct peak around
        1940.
        Task
        Transfer plots from The Unscrambler into Word using Copy and Paste.
        How to Do It
        Open Word. Select the score plot in the regression overview plot and select Edit - Copy. Go to Word and
        place the cursor where you want the plot to appear. Select Edit - Paste. The score plot is now inserted as a
        graphical object in your Word document.
        The plot can be transferred either as a bitmap or a picture file. The picture file option will usually give better
        quality of the plot, but also larger Word files. You may want to use the bitmap option if you transfer plots with
        many plot objects.
You choose between the two options from File - System Setup and the Viewer tab.
        Task
        Export an ASCII-MOD file.
        How to Do It
        Open Results - Regression and select the PLS model you made. The ordinary thing to do would be to open
        the regression overview plot and look at the different predefined plots for this model. But now we take a look
        at the numerical results in the model that is available.
        Click Variance to see the variances for different PCs in the model (Lookup Image F007). Scroll through
        the information field to look at properties of your model.
        Take a look at the ASCII file that is generated. The format of the file is described in chapter Technical
        References, available as .PDF file from CAMOs web site www.camo.com/TheUnscrambler/Appendices .
        How to Do It
        Activate the Wheat Editor and select File - Export. Make sure that the File Format is set to Flat ASCII /
        Wide ASCII before you Click OK. Specify the ASCII file as suggested in the Export ASCII dialog (Lookup
        Image F008).
        Wide ASCII means that each sample is written as a row in the ASCII file with a paragraph mark to tell the end
        of the row. The sample and variable names are written as the first column and first row in the ASCII file.
Open the file in an ASCII editor and look at the file. All names are enclosed in double quotes.
        A fruit punch is to be prepared by blending three types of fruit juice: watermelon, pineapple and orange. The
        purpose of the manufacturer is to use their large supplies of watermelons by introducing watermelon juice, of
        little value by itself, into a blend of fruit juices. Therefore, the fruit punch has to contain a substantial amount
        of watermelon - at least 30% of the total. Pineapple and orange have been selected as the other components of
        the mixture, since juices from these fruits are easy to get and relatively inexpensive.
        The manufacturer decides to use experimental design to find out which combination of those three ingredients
        maximizes consumer acceptance of the taste of the punch.
The responses of interest for the manufacturer are detailed in the table below.
        Consumer acceptance is the most important response, but if the analysis of the results should reveal two areas
        with equally high consumer acceptance, the mixture with lower production cost will be preferred. The sensory
        descriptors are here to provide an explanation for consumer acceptance and directions for further improvement
        (for instance by adding sugar or sweetener if the consumers seem to prefer sweeter mixtures).
Note: You will find the illustrations for this tutorial (Image G001, etc) at the end of the document.
        Task
        Build a simplex centroid design with the help of the Design Wizard.
        How to Do It
        Use File - New Design to start the Design Wizard. The first dialog is Design Wizard - Select Method to
        Use, where you select option From Scratch and click Next to proceed.
        You enter dialog Design Wizard - Select Design Type, where you select option Mixture Design
        (Lookup Image G001); you can see that the contents of the Information field at the bottom of the dialog
        box are updated and give you some advice about the selected type of design. For instance, the last sentence
        states that for optimization purposes we should add interactions and sq uares to our model. We will remember
        that!
        Click Next; this starts the Design Wizard - Define Mixture Variables dialog where we will create a new
        variable for each of our fruit juices. Click New to access the Add Design Variable dialog. Type in the
        details of the first fruit juice (Lookup Image G002):
        Name: Watermelon
        Lower Bound: 30 %
        Upper Bound: 100 %
        Click OK to accept your choices and go back to the Design Wizard - Define Mixture Variables dialog.
        Apply the same procedure to specify the other fruit juices (Pineapple and Orange, varying from 0% to 70%).
        In the next dialog, you have the possibility to define Process Variables (i.e. other design v ariables which are
        not part of the mixture). As we do not need any of those, just click Next.
        You are now in the Design Wizard - Define Non-design Variables dialog, where you should specify
        your responses. Click New to access the Add Non-design Variable dialog; type in the name of the
        response variable. Do that for each response: Accept, Cost, Sweet, Bitter, Fruity. Click Next when all five
        responses are specified.
        The next dialog, called Design Wizard - Define Model, allows you to add terms to a default linear model.
        As you can see (Lookup Image G004), the only available choice in our case is Mixture Interactions and
        Squares. Tick that box and proceed with Next.
        This leads you to the Design Wizard - Define Design Purpose dialog, where the system detects that your
        purpose must be Optimization since you have added interactions and squares. Click Next to proceed.
        The next dialog is Design Wizard - Design Type (Mixture). It recommends a Simplex-Centroid Design
        with Interior Points (Lookup Image G005), and we accept that choice. Click Next to proceed.
        In the next dialog called Design Wizard - Design Details, we accept the default choice of 1 Replicate and
        3 Center Samples and click Next. In the Design Wizard - Randomization Details (General) dialog,
        just click Next to proceed.
        In the Design Wizard - Last Checks dialog, check that all details of the design are correct (Lookup
        Image G006). Should anything be different from what you were supposed to have chosen, go back as many
        dialogs as necessary with the Back button, then move forward again.
        Click Preview to have a look at the randomized list of experiments. If you are not happy with the
        randomization, click OK to go back to the main dialog then Re-randomize to start a new randomization (then
        click Preview again to check the result). If you wish to print out the randomized list of experiments, click Lab
        Report then OK.
        Once you have made all necessary checks and corrections, click Finish; this displays an information dialog
        (Lookup Image G007) (click OK after reading its contents) and opens the new designed data table into the
        Editor (Lookup Image G008). Be aware that if you need to do any further corrections after that, you will
        have to use command File - Duplicate - As Modified Design to access the Design Wizard once again.
        Save the new table with File - Save (you may call it Fruit Punch empty for instance).
        Task
        Import response values from Excel into your designed data table.
        Click OK after double-checking your choices: you are now back in your data table, with the response values
        filled in.
        Select File - Save As and give the table a new name (for instance Fruit Punch).
        Task
        Run Statistics, display the results as plots, check response variations and look for abnormal values.
        How to Do It
        With your Fruit Punch data table displayed in the Editor, select Task - Statistics.
        Choose the following settings in the Statistics dialog:
        Sample Set: All Samples (12)
        Variable Set: Cont Non-Design Vars (5)
        Calculate Cross-Correlation: not selected
        then click OK to start the computations.
        Click View in the Statistics Progress dialog: the Statistics results are displayed as two plots (Lookup
        Image G011). The upper plot is Percentiles , the lower Mean and SDev.
        Save the results file as Fruit Punch Stats.
        Now we are going to display the same two plots for Design samples and Center samples, in order to compare
        variation over the whole design to variation over the replicated Center samples. If the experiments have been
        performed correctly, there should be much more variation among design points than among the three replicates
        of the Center sample.
        Select Plot - Statistics; in the Statistics dialog (Lookup Image G012), look at the Compressed sheet
        and focus on the Sample Groups field. Design should already be selected; select Center as well (you can
        see that the plot preview is updated as a result, now showing several groups in different colors) and click OK.
        The Percentiles and Mean and SDev plots are now displayed for two groups (Lookup Image G013). The
        bars or boxes for Design samples appear in blue and for Center samples, in red (unless you are using your own
        color scheme).
        On the Percentiles plot, you can see that there is much more variation among design points than among the
        Center samples. This also appears clearly on the Mean and SDev plot: for instance, if you click successively
        on the blue and red bars for variable Accept, you will see that SDev is 0.75 for Design samples and only 0.25
        for Center samples.
        Conclusions:
         
         The ranges of variation of the 5 responses are as expected.
         
         There is no abnormal value for any response.
         
         There is much more variation over the whole design than among the Center samples, which suggests that
           the experiments were performed correctly.
        Task
        Build a PLS model of the response variations, validate it with cross validation and uncertainty testing. View
        the results and check the model.
                                    Method                      PLS2
                                    Sample Set                  All Samples (12)
                                    X-Variables                 Design Def Model (3+6)
                                    Weights for X-vars          All 1/SDev
                                    Y-Variables                 Cont Non-Design Vars (5)
                                    Weights for Y-vars          All 1/SDev
                                    Validation Method           Cross Validation
                                    Uncertainty test            Selected
                                    Model Size                  Full
                                    Num PCs                     5
                                    Issue Warnings              Selected
        Click OK, then have a look at the PLS2 Regression Progress dialog (Lookup Image G014). The
        model needs 4 PCs, and even then the Y-validation variance is quite high (0.50). We can also see that several
        warnings have been issued, especially for PC 0 (that is to say, at the Centering stage of the computations) and
        PC 1.
        This suggests some problems in the data  maybe an outlier? We will have to investigate.
        Click View to access the Viewer where the regression results are displayed.
        Note: Since this is a mixture model, all terms of the model are linked. Therefore it would be meaningless to
        remove the non-significant effects from the model. This is why we do not mark the non-significant
        coefficients nor recalculate the model without the marked variables, as we would have done in another context.
        From the X-variables sheet (Lookup Image G021), choose the following:
                                 Axis 1        Watermelon(A)
                                   Axis 2           Pineapple(B)
                                   Axis 3           Orange(C)
        Double-check your choices then click OK. The Response Surface plot for variable Accept is now displayed in
        the upper left sub-view.
        Do the same in the other three sub-views with responses Sweet, Bitter and Fruity (Lookup Image G022).
        Have a look at the four response surfaces and interpret them.
        You may copy one of the plots to sub-view 1 (with Window - Copy To - 1) so as to study it in more detail.
        Let us do so with response Accept (Lookup Image G023). We can see that consumer acceptance is low
        (blue curves) for mixtures with high Watermelon or high Pineapple contents.
        Maximum acceptance is reached for a fruit punch with relatively high Orange and low Pineapple. By
        clicking on that point we dan display its coordinates (A= 38.75, B= 16.04, C= 45.21) and the Accept value
        (3.76).
        Conclusions:
          
          With the help of the Y-variance curve, the Influence plot and the Outlier List, we have found an error in
            the data.
          
          Once the punching error has been corrected, the PLS2 model has good quality (high explained Calibration
            and Validation Y-variance).
          
          The Correlation Loadings show the underlying logic in response variations.
          
          The Regression Coefficients have large uncertainties for response Accept, but are better for the sensory
            responses.
          
          The Response Surface plots show maximum consumer acceptance for a fruit punch with about 39%
            Watermelon, 16% Pineapple and 45% Orange.
        Fluorescence spectroscopy is able to distinguish similar molecules and can discriminate identical molecules in
        different chemical environments. This is due to the possibility to scan excitation spectra at specified emission
        wavelengths and to scan emission spectra at specified excitation wavelengths (EEM -scans). This procedure
        results in 3-D graphs of the fluorescence intensity with respect to different excitation and emission
        wavelengths. But the EEM data are strongly intercorrelated and difficult to interpret. Standard unfolding
        methods often give unsatisfactory results. We will use a three-way analysis approach to overcome this
        problem.
        Severity (Y Data)
        The Y data is found in table Tutor_h_Y2D, consisting of 32 rows for the 32 woodchip samples and one
        column, Severity.
        Severity of steaming is a measure reflecting the duration and temperature of steam treatment. The spruce and
        beech samples were treated with steam at temperatures from 160C to 220C. The Severity values range from
        1.7 to 3.5.
Note: You will find the illustrations for this tutorial (Image H001, etc) at the end of the document.
        Task
        Toggle 3D data layouts.
        How to Do It
        Open the data file Tutor_h_X3D by selecting File - Open. It is a file of type 3D Data. (Lookup Image
        H001)
        The table opens in the 3D Editor. It is a table of OV 2 layout (1 object mode, 2 variable modes), therefore its
        column numbers are two-fold. For example, column 1:6 corresponds to primary variable number 1 (Excitation
        wavelength 250 nm) and secondary variables number 6 (Emission wavelength 350 nm). (Lookup Image
        H002)
                                                                               2
        Toggle the layout several times (Ctrl+3) until you are back to an OV table of size 32 x (66 x 31), that is to say
        32 samples, 66 Primary Variables and 31 Secondary variables. The size of the table is shown at the bottom
        right corner of the Editor. (Lookup Image H003)
        Task
        Study the raw data by plotting the fluorescence spectra of a few wood samples.
        How to Do It
        Go to menu Plot - Matrix 3-D and select sample 13, BFFi (Beech, Fresh wood, Fine grinding). The
        excitation-emission spectrum for this sample is displayed in the Viewer. (Lookup Image H004)
        You may use the Rotate option (        or View - Rotate) to view the spectral landscape from various angles.
        Use either the mouse or the arrow keys on your keyboard to rotate the plot. Holding your finger on an arrow
        key will allow a continuous rotation of the plot; pressing the Alt Gr key at the same time will slow down the
        rotation.
        Menu Edit - Options (or      ) allows you to change the Plot Layout from a 3-dimensional Landscape
        view into Contour or Map. (Lookup Image H005)
        Go back to the 3D Editor and use menu Plot - Matrix 3-D to plot sample 29, SFFi (Spruce, Fresh wood, Fine
        grinding). (Lookup Image H006)
Close your various matrix plots before proceeding with the tutorial.
        Task
        Define a Primary Variables set and a Secondary Variables set.
        How to Do It
        Go to menu Modify - Edit Set or use the corresponding shortcut Ctrl+E. This opens up the Set Editor
        dialog. (Lookup Image H007)
        Click on the Add button to open the New Primary Variable Set dialog. Use the following settings:
         
         Name: Excitation 320-540 nm
         
         Data type: Spectra
         
         Interval: 15-59
           Alternatively, click the Select button and select wavelengths 320 to 540 nm in the Select Variables
           dialog.
        (Lookup Image H008)
        Click OK; you are back in the Set Editor dialog where you can see your Primary Variable Set.
        Use the drop-down list and select option Secondary Variable Set. (Lookup Image H009). Click on the
        Add button to open the New Secondary Variable Set dialog, and define a set as follows:
         Name: Emission 370-600 nm
         
         
         Data type: Spectra
         Interval: 8-31
         
           Alternatively, click the Select button and select wavelengths 370 to 600 nm in the Select Variables
           dialog.
        (Lookup Image H010)
        Click OK; you are back in the Set Editor dialog where you can see your Secondary Variable Set. (Lookup
        Image H011)
        Note!
        If you made any mistake in defining the variable sets, use the Properties button to return to the New
        Primary/Secondary Variable Set dialog and make corrections accordingly.
        Click OK; you are back in the 3D Editor. Use menu File-Save As to save the data sets information. You
        may call your new table Tutor_h_X3D with sets. (Lookup Image H012)
        Task
        Set up the options for a Three-Way PLS Regression and launch the model calculations.
        How to Do It
        Make sure that your 3D data table Tutor_h_X3D with sets is on screen. Select Task - Regression to
        open the Regression (Three-Way PLS) dialog. Choose the following options:
         
         Sample Set: All Samples [32]
         
         Match samples in X and Y Data Tables By row numbers
         
         Pri. X-Vars: Excitation 320-540 [45]
           Weights: All 1.0
         
         Sec. X-Vars: Emission 370-600 [24]
           Weights: All 1.0
         
         Y-Variable File: Tutor_h_Y2D
           Variable Set: Severity [1]
           Weights: All 1.0
         
         Validation Method: Cross Validation. Use the Setup button to choose Full Cross Validation
         
         Num PCs: 10
         
         Center Data: selected
        (Lookup Image H013)
        Note!
        In the Y-variables sheet, you may have to Browse to find the Y-Variable File Tutor_h_Y2D.
        Click OK to launch the calculations. The Three-Way PLS Regression Progress dialog appears. As the
        calculations run, the Y-Validation Residual Variance curve per cross validation segment is shown. When the
        calculations are over the Residual Y-Validation Variance curve for the global model is displayed. (Lookup
        Image H014)
        Hit the View button. The Regression Overview opens, showing four default plots. These are (clockwise):
        Scores, X1-Loading Weights and Y-Loadings, Predicted vs. Measured, Residual Y-Validation Variance.
        (Lookup Image H015)
        How to Do It
        Go to menu Plot - Sample Outliers. Keep the default settings and click OK. Four plots appear in the
        Viewer: Scores, Influence, Y -Residual Sample Variance and X-Residual Sample Variance. (Lookup Image
        H016)
        Click on the Influence plot so that it is active, then use the X and Y buttons (       ) to display only X
        information, or only Y information, or both. Sample 18 (SOFi) is an outlier with a high Residual Y-Variance.
        Go to menu Edit - Mark - One By One or use the corresponding shortcut          , then click on sample 18 in
        the Influence plot. This sample is now marked by a circle on all plots. (Lookup Image H017)
        Go to menu Task - Recalculate Without Marked. This brings up the Regression (Three-Way PLS)
        dialog, and you can observe that sample 18 is shown in the Keep Out of Calculation field.
        Check that the Cross Validation setup is still Full Cross Validation, and that the number of components
        (Num PCs) is 10. (Lookup Image H018)
        Click OK to compute a new model without sample 18 (Lookup Image H019). Click View to display the
        Regression Overview. Go to menu Plot - Sample Outliers and check that no sample is outlying in this
        new model. (Lookup Image H020)
Go to menu File - Save and save the new model as Wood Severity_model 2
        Tasks
        the regression coefficients and the Predicted vs. Measured plot.
        Task
        Interpret the Y-Residual Validation Variance plot and determine optimal number of components (PCs).
        Note!
        If your plot differs from the picture, you may adjust it using this set of buttons:
        The Y-residual validation variance shows a plateau from PCs 7-8, in agreement with the suggested number of
        components given by the software. We decide to be conservative and use 7 PCs for this model.
        Task
        Interpret the Scores plot and find out if there are any clear groups of samples.
        How to Do It
        Activate the Scores plot (map of samples) by clicking on it; it is the plot situated in the first quadrant. The
        sample names contain a lot of information. Let us focus on Wood type.
        Go to Edit - Options or click on this shortcut:      . This opens the Options dialog. In the Markers Layout
        field, choose option Name, then click on the first box. This will disable the following boxes, so that only the
        first character in the sample name will be kept (Lookup Image H022). Click OK. The Sample names only
        indicate S for Spruce wood (soft) or B for Beech wood (hard).
        Click on the Next Vertical PC button      , or use the Up arrow key on your keyboard to display the Scores
        for PC1 vs. PC3. We can observe that PC3 separates the Spruce samples (to the bottom) from the Beech
        samples (to the top). (Lookup Image H023)
        Task
        Interpret the X-Loading Weights and find out which information is carried by PC3.
        Click OK. The Loading Weights for excitation spectra (Primary variables, X1) appear in the top window and
        the Loading Weights for emission spectra (Secondary variables, X2) appear in the bottom window. (Lookup
        Image H025)
        PC3 is represented in green on the plots. On the top plot, it shows a peak for excitation 355 nm. On the bottom
        plot, it shows a peak for emission 400 nm.
        These peaks describe the CH3O functional groups of hardwood and softwood. The CH3O functional groups
        are higher in hardwood lignin than in softwood. This information is shown with PC3. The beech samples have
        higher scores than the spruce samples for this PC.
        Task
        Interpret the Regression Coefficients and find important absorption/emission bands.
        How to Do It
        Go to Plot - Regression Coefficients, and in the Regression Coefficients dialog choose the following
        settings:
         
         Plot type: Matrix
         
         X-variables: Primary X Vs Secondary X
         
         Y-variable: 1, Severity
         
         Components: 7
        Double click on the preview screen at the top of the dialog to enlarge the plot: the plot will be displayed in Full
        Window (Lookup Image H026)
        Click OK to display the regression coefficients plot. The plot is shown in landscape layout. (Lookup Image
        H027) We can observe four major areas presenting high regression coefficients (three positive, one negative).
        To better study the plot, use the rotate function (   or View - Rotate). Use either the mouse or the arrow
        keys on your keyboard to rotate the plot. Holding your finger on an arrow key will allow a continuous rotation
        of the plot; pressing the AltGr key at the same time will slow down the rotation.
        Menu Edit - Options (or      ) allows you to change the Plot Layout from a 3-dimensional Landscape
        view into Map. Move your mouse over the Map plot to get the coordinates for excitation and emission
        wavelengths. (Lookup Image H028)
        Task
        Interpret the Predicted and Measured plot and find out which samples are best predicted.
        How to Do It
        Go to Plot - Predicted vs Measured. In the dialog, choose the following settings:
         
         Plot type: Predicted and Measured
         
         Y-variable: 1, Severity
         
         Components: 7
         
         Samples: Calibration
        (Lookup Image H030)
        Click OK to display the plot. The blue curve corresponds to our model, while the red curve corresponds to the
        measured values. There is a good fit of the model. Yet we can observe that several samples are not as well
        predicted as the others. By moving the mouse over these samples to identify them, it is seen that especially
        fresh wood samples (F) are generally better predicted than old wood samples (O). (Lookup Image H031)
        The RMSEC for the model is accessible from Plot - Predicted vs Measured. Choose settings:
         
         Plot type: Predicted vs Measured
         
         Y-variable: 1, Severity
         
         Components: 7
         Samples: Calibration
         
        RMSEC is of 0.11, for steam treatments severity values that ranged from 1.7 to 3.5. This is about the size for
        the reproducibility of the severity measurement.
Note: You will find the illustrations for this tutorial (Image I001, etc) at the end of the document.
        Variables
        The first three variables are concentration measurements of blue, green and orange dyes. Variables 4 to 59 are
        UV/Vis spectra measured at range 250-800 nm with step 10 nm. In the Set Editor dialog box, select the
        Variable Sets option to see the list of existing variable sets. (Lookup Image I003)
        When you have seen the sets, click OK to leave this box and return to the data table.
        Task
        Plot the spectra of all mixture samples together:
        How to Do It
         1. Select the mixture samples 4-39 (either directly from the Editor, or with Edit - Select Samples  the set
            you are interested in is called Mixture).
         2. Use Plot - Line (or the  button from the toolbar) and choose Variable set 250-800nm as scope for
            the plot. (Lookup Image I004)
        To plot the reference spectra of the three pure components, select samples 1-3 and make a Line plot of
        Variable set 250-800nm. (Lookup Image I005)
        To plot the reference concentrations of the three dyes, select columns 1-3 and make a Line plot of Sample set
        Mixture. (Lookup Image I006)
        Note:
        Reference measurements of spectra and concentrations of pure components are not necessary to make your
        data set suitable for MCR!
        Task
        Set up the options for an MCR analysis, launch the calculations and plot results.
         Task
         Plot MCR results for various numbers of pure components.
         How to Do It
         Actually, the Unscrambler MCR procedure generates several sets of results, covering a number of estimated
         pure components from 2 to <optimum +1>. By default, the results are plotted for the optimal number of
         components.
         You may view the results for varying numbers of pure components. Let us plot the spectral profiles for a 2 -
         component solution. Click on the Estimated Concentrations plot to make it active (blue frame), then click Plot
         - Estimated Spectra, select Number of Components as 2, and Profiles 1-2 as shown. (Lookup Image
         I010)
         Click OK: the plot of estimated spectra for a resolution with two pure components is displayed.
         In a similar manner, click on the bottom left subview to make that plot active, then use Plot - Estimated
         Spectra, to plot the 4-component solution.
         MCR fitting and Principal Component Analysis (PCA) fitting results are also available for varying numbers of
         pure components from 2 to <optimum +1>. Each fitting includes Variable Residuals, Sample Residuals and
         Total Residuals plots. The plot of Total Residuals for MCR fitting is shown by default in the lower-right
         subview. Like any other plot, it can also be accessed from the Plot menu. Click and activate the lower-right
         subview, then click Plot - Residuals. In the MCR fitting tab, select Total Residuals. (Lookup Image
         I011)
         Click OK.
         Here are the four plots which should now be displayed in your Viewer: (Lookup Image I012)
         If the lower-right plot appears as a curve instead of bars, use Edit - Options (or      or Ctrl+L) and select
         Bars as Plot Layout.
100 Multivariate Curve Resolution of Dye Mixtures (Tutorial I)                               The Unscrambler Tutorials
        Tutorial I - Interpret MCR results
        Task
        Determine the optimum number of pure components.
        How to Do It
        In the Total Residuals, MCR Fitting plot, residuals are high for 2 components, low for 3 components, and not
        significantly decreasing for 4 components. (Lookup Image I012) This suggests that 3 components is the
        optimum solution.
        Click and activate the Estimated Spectra plot with 3 components, and enlarge it by clicking Window - Copy
        To - 1. The toolbar contains a set of buttons             , which is used to navigate between results at
        different numbers of components. Use the buttons to increase and decrease the number of components, and
        watch the impact on the profiles.
        As you can see, the 4-component solution contains two almost identical spectral profiles. This also suggests
        that 4 components may not be the optimum number, and that the mixtures contain three pure components only.
        Task
        Run an MCR calculation with Initial Guess.
        How to Do It
        If prior knowledge such as spectra of pure components or concentrations of mixture samples exists, you may
        include this information in the MCR calculation to help the algorithm converge towards the right solution of
        curve resolution.
        Go back to data table Tutor_i data by using menu Window - Tutor_i. Click Task - MCR. The MCR
        dialog box with default settings will open up. In the dialog box, click Enable Initial Guess and select option
        Spectra (Samples). (Lookup Image I013)
        Click the Select button and pick rows 1 to 3 as initial guess for spectra (Lookup Image I014), then click
        OK to return to the MCR dialog box.
        Click OK to launch the calculations, then View to open the model results. (Lookup Image I015)
        Save the result file as Dye_Result2.
        Notes
        1. When using the initial guess option, The Unscrambler requires all pure components to be included as
        initial guess inputs. Partial reference will generate erroneous results. It is recommended to run MCR without
        initial guess if only partial reference is available.
        2. The Unscrambler only requires either spectra or concentration of pure component as an initial guess input.
The Unscrambler Tutorials                          Multivariate Curve Resolution of Dye Mixtures (Tutorial I) 101
         Tutorial I - Validate the Estimated Results with Reference
         Information
         Task
         We are going to compare the models Estimated Concentrations for a 3-component solution to the existing
         reference concentrations found in the data table and plotted earlier. In a first step we are going to compare the
         concentration profiles visually.
         How to Do It
         Select the Estimated Concentrations plot, then use menu Window - Copy To - 1. Reduce the window size of
         the plot on your screen. Then go back to the data table (Window - Tutor_i) and build a line plot of the three
         concentrations (first 3 columns of the table). Resize the windows of the two plots in order to compare them on
         screen. (Lookup Image I016)
                                    st
         You can observe that the 1 estimated concentration profile is similar to the reference profile of the blue dye
                                          nd
         (blue curves on the plots), the 2 estimated concentration profile is similar to the reference profile of the green
         dye (red curves on the plots), and the 3rd estimated concentration profile is very close to the reference
         concentration of the orange dye (green curves on the plots).
         Note!
         Estimated concentrations are relative values within an individual component itself. Estimated concentrations of
         a sample are NOT its real composition.
         The estimated spectral profiles can be compared to the reference spectral profiles in the same way as for the
         concentrations. Because we used the spectra as initial guess inputs in this example, the comparison shows a
         perfect match. However, estimated spectra are unit-vector normalized, they are not the real spectral profile
         of the samples. (Lookup Image I017)
         Tasks
          
          Import the MCR result matrix of estimated concentrations,
          
          Compare the estimated concentrations to the reference concentrations in 2D scatter plots,
          
          Convert the estimated concentrations into real scale.
         How to Do It
         Use menu File - Import - Unscrambler Results, and select your MCR result file Dye_Result2. Click
         Import. The Import from MCR Result dialog box will open up. Select matrix Estimated Conc and type in 3
         in the PCs box, to import the concentration profiles for a 3-component mixture system. (Lookup Image
         I018)
         Click OK to perform the importation. A new data table Dye_Result2_Estimated Conc is generated.
         (Lookup Image I019)
         Insert three empty rows at the top of this table, so that the table has a total of 39 rows. (Lookup Image I020)
102 Multivariate Curve Resolution of Dye Mixtures (Tutorial I)                              The Unscrambler Tutorials
        Go to table Tutor_i, select the first three columns (blue, green and orange), copy them and paste them at the
        beginning of the new data table. We now have a table of six columns, containing the three measured
        concentrations of the pure dyes followed by the three estimated concentrations. (Lookup Image I021)
        Select columns Blue and 1 (press the Ctrl key on your keyboard to select several columns at a time). Click
        Plot - 2D Scatter to display a 2D Scatter plot of these columns. The correlation between estimated and
        reference concentrations for the blue dye is of 0.994. If the box containing plot statistics (among which
        correlation) is not displayed on the upper left corner of your plot, use View - Plot Statistics to display it.
        For the green dye (columns Green and 2 in the table), the correlation between estimated and reference
        concentrations is of 0.997.
        As for the orange dye (columns Orange and 3), the correlation is of 0.998. These very high correlations
        indicate that the MCR calculations have determined concentration profiles accurately in this case. (Lookup
        Image I022)
        Now let us convert the estimated Orange concentrations to real scale. In order to do this, at least one reference
        measurement is needed. The estimated concentrations (in relative scale) of all samples can be converted into
        real concentration scale by multiplying by a factor <real concentration / estimated concentration>.
        In the present case, we can use for example sample PROBE_11, which has a reference concentration of Orange
        dye of 7 and an estimated concentration of 0.4443.
        Use menu Edit - Append - Variables to append a new column at the end of the table, and name it MCR
        Orange real scale. Go to Modify - Compute General, and type in the expression: V7=V6*(7/0.4443)
        in the Expression space. (Lookup Image I023)
        Click OK to perform the calculation. The new column fills up with the values of estimated Orange dye
        concentrations converted to real scale. (Lookup Image I024)
The Unscrambler Tutorials                           Multivariate Curve Resolution of Dye Mixtures (Tutorial I) 103
Constraint Settings in Multivariate Curve Resolution
(Tutorial J)
         Description of Tutorial J
         Context of Tutorial J
         In this tutorial we will utilize FTIR spectra of an esterification reaction to extract pure spectra and their relative
         concentrations. The original data are from University of Rhode Island (Prof. Chris Brown), USA.
         The esterification reaction of iso-propanol and acetic anhydride using pyridine as a catalyst in carbon
         tetrachloride solution was monitored by FTIR. The initial concentrations of these three chemicals were 15%,
         10% and 5% in volume, respectively. Iso-propyl acetate was one of the products in this typical esterification
         reaction. The reaction was carried out in a ZnSe cell, and mixture spectra were measured at 4 cm -1 resolution.
         The data set consisted of 25 spectra, covering approximately 75 minutes of the reaction. To shift the
         equilibrium of the esterification, one-tenth of the volume was removed from the cell at 24, 45 and 60 minutes.
         An equal amount of a single reactant was added to the cell in the sequence of acetic anhydride, pyridine and
         iso-propanol.
Note: You will find the illustrations for this tutorial (Image J001, etc) at the end of the document.
104 Constraint Settings in Multivariate Curve Resolution (Tutorial J)                          The Unscrambler Tutorials
        Task
        Run a PCA on the raw data.
        How to Do It
        Click Task - PCA to run a Principal Component Analysis and choose the following settings:
         
         Sample set: All Samples
         
         Variable set: All Variables
         
         Validation Method: Full cross-validation
         
         Num PCs: 10
        (Lookup Image J002)
Once the PCA calculations are done, click View to open the result viewer. (Lookup Image J003)
        Click Plot - Loadings, select a plot of type Line, and type in value 1-3 in field Vector 1, so that the first
        three principal components will be represented into the same line plot. (Lookup Image J004)
        Click OK to display the plot.
        Select another plotting area by clicking on it with the mouse, for example the upper-right subview. Click Plot -
        Loadings, select a plot of type Line, and type in value 4-6 in field Vector 1. Click OK to display the plot.
        (Lookup Image J005)
                                                  th
        You can see that the loadings along the 6 principal component are quite noisy. The program recommends four
        components as the optimal number of PCs in this model. Select the Explained Variance plot by clicking on it
        with the mouse, then click View - Numerical. (Lookup Image J006) As you can see, the explained
        variance globally reaches a plateau from the 4 th principal component. The 5th and 6 th PCs still show some slight
        increase; at that stage, it is difficult to know whether they represent noise or real information.
        Now, study the Influence plot at the bottom-left corner of the Viewer. You may observe that sample 1 sticks
        out from the group of samples, with a high leverage and a high residual variance. Go to menu Plot - Sample
        Outliers to display a combination of four useful plots for outlier detection. The plot of Residual Sample
        Variance at the bottom-left corner indicates a high validation residual for sample 1. (Lookup Image J007)
        As there is no validation check in MCR, we may use the outlier information issued from PCA into our MCR
        modelling later on.
        Task
        Build a first MCR model with default settings.
How to Do It
The Unscrambler Tutorials                   Constraint Settings in Multivariate Curve Resolution (Tutorial J) 105
         Using menu Window - Tutor_j, go back to the data table. Click Task - MCR and keep the default
         settings:
          
          Sample set: All Samples
          
          Variable set: All Variables
          
          Non-negative concentrations: selected
          
          Non-negative spectra: selected
          
          Closure: not selected
          
          Unimodality: not selected
          
          Sensitivity to pure components: 100
         Note: MCR computations are demanding. Building the model can easily take several minutes depending on the
         size of the data set, the selected options and the capacity of your machine.
         Click View when the calculations are finished; the MCR result viewer opens. Notice that the program suggests
         4 as the optimal number of pure components, by indicating (4) at the bottom of each plot. (Lookup Image
         J009)
         Task
         Read the MCR Message List and follow the systems recommendation for the Sensitivity to pure
         components setting.
         How to Do It
         Click on menu View - MCR Message List in model mIR Result1 to check the recommendations given
         by the system. There are four types of recommendations:
          
          Type 1: Increase sensitivity to pure components
          
          Type 2: Decrease sensitivity to pure components
          
          Type 3: Change sensitivity to pure components (increase or decrease)
          
          Type 4: Baseline offset or normalization is recommended.
         In the present case, the system recommends to change the setting for sensitivity to pure components. (Lookup
         Image J010)
106 Constraint Settings in Multivariate Curve Resolution (Tutorial J)                   The Unscrambler Tutorials
        The default setting (100) that was used for Sensitivity to pure components is usually a good starting point.
        After interpreting the results and reading the system recommendations, you can tune it up or down between 10
        and 190. The higher the Sensitivity, the more pure components will be extracted. Therefore, if too many
        components are extracted, it is recommended to reduce the setting. On the opposite, if you would like to see
        more components at an almost undetectable level, or even some noise profiles, it is recommended to increase
        the setting.
        Go back to the data table and re-do the MCR calculation with a Sensitivity to pure components setting of
        150. (Lookup Image J011)
        The plot of Estimated spectra is now shown by default for 5 components instead of 4 in the previous model.
        (Lookup Image J012)
        One can compare those profiles with FTIR spectra of known constituents, and identify the 5 estimated spectra
        as pyridine, iso-propanol, a possible intermediate, propyl acetate and acetic anhydride, from curves 1-5
        respectively.
        Task
        Run MCR with a closure constraint. Compare two MCR models on the same data, with and without closure.
        How to Do It
        Among the MCR settings we have used so far, two types of constraints were not selected.
        A constraint of Unimodality can be applied to restrict the resolution to concentration profiles that have only
        one maximum.
        With a constraint of Closure, the resolution will yield concentration profiles whose sum is constant.
                                                                                      th         th
        In the present case, acetic anhydride was added at 24 minutes (between the 8 and the 9 samples), which
        means that the first 8 samples can be treated in closure conditions.
        Go back to the data table and run a new MCR model with the following settings:
         
         Sample set: Closure [8]
           (contains the first 8 samples of the data table)
         
         Variable set: All Variables
         
         Non-negative concentrations: selected
         
         Non-negative spectra: selected
         
         Closure: selected
         
         Unimodality: not selected
         
         Sensitivity to pure components: 100
        (Lookup Image J013)
The Unscrambler Tutorials                   Constraint Settings in Multivariate Curve Resolution (Tutorial J) 107
         Once the computations are finished, save the model file as mIR Result3.
         You may compare the resolved concentration and spectral profiles of pure components with and without the
         closure setting. To do that, compute a new MCR model on sample set Closure without checking the Closure
         constraint option. Save the new model file as mIR Result4 and compare the results to mIR Result3.
         The spectral profiles under the constraint of closure present higher peaks for pure component 1 (blue) for
                                                -1
         wavelengths around 110 and 1250 cm . (Lookup Image J014)
         You can also observe that under constraint of closure, the concentrations of the pure components always add
         up to 1. (Lookup Image J015)
         Task
         Use the interactive Recalculate functionality to remove samples or variables with high residuals.
         How to Do It
         Click menu Window - mIR_Result1 to bring back your first MCR model on screen.
         The Validation calculations of the PCA model that we built earlier indicated that Sample 1 was an outlier. We
         can check this again in the MCR model by looking at the PCA fitting residuals. Click on the bottom-right
         subview to highlight it, then use Plot - Residuals, choose sheet PCA Fitting and option Sample Residuals.
         You may notice a high residual showing for Sample 1, compared to the other samples. Let us build a model
         without this sample.
         Use the marking tools       to highlight sample 1 on one of the plots, for example the Sample Residuals,
         PCA Fitting plot. (Lookup Image J016)
         Click menu Task - Recalculate Without Marked to specify a new MCR calculation without sample 1.
         (Lookup Image J017)
         This brings you back to the MCR dialog, where Sample 1 is now included in the Keep Out Of Calculation
         field. You may launch the calculations to get the new MCR results.
         Note that similarly, you may want to keep out of the model non-targeted wavelength regions, or highly
         overlapped wavelength regions.
         Click Plot - Residuals and choose Variable Residuals. (Lookup Image J018)
108 Constraint Settings in Multivariate Curve Resolution (Tutorial J)                       The Unscrambler Tutorials
        Mark any unwanted variables on the plot using the marking tools, for examples variables around 1100-1140
        cm-1 which present very high residuals (Lookup Image J019), then use Task - Recalculate Without
        Marked to specify a new MCR calculation.
The Unscrambler Tutorials                   Constraint Settings in Multivariate Curve Resolution (Tutorial J) 109
Tutorial C - Illustrations
         C001 The Light Absorbance Spectrum
           Absorbance
            log(1/T)
                 3.5
3.0
2.5
                 2.0
                                                             12
                 1.5    1     23 4   56 78   9         10   11 1314 15      16
C023 The Line Plot dialog with Source: Tutorial C and Matrix: ResYValTot
Tutorial D - Illustrations
        D006 The Enam FRD Analysis of Effects results displayed with Significance Testing
        Method: COSCIND
         D009 The Statistics results plotted as Percentiles and Mean and SDev (Design
         Samples)
        D011 The Statistics results plotted as Percentiles and Mean and SDev (Design
        samples and Center samples)
D012 The Response Surface dialog with the X-var sheet active
Tutorial E - Illustrations
         E001 Data Table with category variable Iris
Tutorial F - Illustrations
         F001 The Import ASCII dialog
        F005 The Category Variable Wizard  Enter Variable Name and Choosing Method
        dialog
Tutorial G - Illustrations
         G003 The Design Wizard - Define Mixture Variables dialog with three defined
         variables
G007 The Information dialog displayed upon exiting the Design Wizard
         G010 The Import Worksheet dialog - Selecting ranges for Data, Sample names and
         Variable names
         G014 The PLS2 Regression Progress dialog showing high residual variance and
         several warnings
G021 The Response Surface dialog with the X-variables sheet active
G023 The Response Surface for Accept with the optimum coordinates and value
        H007 Set Editor dialog for an OV2 data table. Primary Variable Sets, Secondary
        Variable sets and Sample sets can be defined
         H009 Set Editor dialog for an OV 2 data table. A Primary Variable Set was defined, now
         the Secondary Variable Sets option is selected to define a new set
        H017 Sample Outliers plots, Wood Severity_model 1, with sample 18 marked with a
        circle
Tutorial I  Illustrations
        I001 Tutor_i data table, size 39x59
0.4
0.2
            0
                                                            Variables
                200             400            600              800
         PROBE_01 PROBE_1B PROBE_02PROBE_2B PROBE_03 PROBE_3B PROBE_04
1.0
0.5
           0
                                                            Variab les
                200            400             600              800
         BB_50 GR_50 OR_50
15
10
          0
                                            Samples
                             10   20   30      40
         Blue Green Orange
I010 Estimated Spectra dialog, plotting estimated spectra for a 2-component solution
I020 Imported matrix after insertion of three empty rows to the top
        I024 Editor with a column presenting the estimated Orange concentrations converted
        to real scale.
J009 MCR Overview for model mIR Result1 with default settings
         J012 Estimated spectra with Sensitivity to pure components set at 150 (model mIR
         Result 2)