My brother is in town today with the Swans for the game against the Lions this weekend. We were able to get together at the Thai Rose restaurant at Woolloongabba. The Thai Rose was a good suggestion by Jason and Mel and John got to meet his grand-niece Abigail (now nearly 5 months) for the first time. We also enjoyed the food and the river-ferry ride back to John’s hotel afterwards.
SapidBlog has moved to WordPress.com
Posted by sapidblog on 27 August, 2010
Owing to constant feature shinkage at Edublogs.org we have moved SapidBlog to a new home at WordPress.com. We were among the earliest supporters of Edublogs.org but of late they keep removing features from the free blog and putting them into the paid blog feature list. Eventually, we lost the capability to even save our blog to a backup file without paying for it and decided that enough was enough.
BBEQF8B3R6GF
Posted in blogs | Leave a Comment »
Doing Chemometrics (and Loving It?)
Posted by sapidblog on 29 April, 2010
Well I’m not sure about the “loving it” part but this week the chemometrics sessions proceeded much more smoothly than the previous week. Setting up in class time was smoother and we didn’t create the expectation in students of being able to take the Matlab plug-ins home and expect them to work without trouble. This time we tried to create the expectation that the work was to be completed in the 3 hours of scheduled laboratory time. The use of the add-in outside of class is a bonus that shouldn’t be relied on.
This week we ran the new practical on determining octane numbers in fuels using NIRS (near-infrared reflectance spectrometry). This session is all computer-based because the data set used is from the paper: “Two data sets of near infrared spectra“ by John H. Kalivas in Chemometrics and Intelligent Laboratory Systems 37 (1997) 255-259. Nevertheless, it concerns a very practical question that everyone confronts whenever they refill their vehicle’s fuel tank at a petrol bowser, namely, what are these octane numbers and why is one more expensive than another?
In this practical the method of PLS (partial-least squares) was used to build a calibration model for a training set of fuel samples of octane number determined from the standard engine method. The calibration model is then validated using samples of known octane number that are independent of the calibration model. PLS was carried out in an Excel® spreadsheet with a custom plug-in constructed using Matlab® EX builder for Microsoft Excel® as can be seen in the screen image below:
The screen above demonstrates the “variable selection” step where the blue diamonds show the root mean squared error for calibration samples (RMSEC) as a function of the number of latent variables in the model. In this case the RMSEC tends to underestimate the true error because of “overfitting.” On the other hand, the open-green diamonds are the RMSEP for validation samples which gives a better estimate of the true error and shows that the optimum number of latent variables to include in the PLS model is 3-5 (that is without excessively under- or over-fitting). The PLS plugin is also capable of performing cross-validation. Plotting the known versus estimated octane numbers for the validation samples is shown below:
The PLS model is able to successfully predict octane numbers of the published data set without evidence of significant bias and with a RSD of about 1%.
While there was a range of comments from the students it seemed like this approach of getting them immersed more in the experience of carrying-out chemometrics was really helpful. The capability of performing the PLS analysis in an Excel® spreadsheet is extremely important because it enables the students to start producing results almost immediately without having to learn the ins-and-outs of a more complex software application such as Matlab® or Unscrambler®.
If you would like to know more about the use of the PLS plugin described here then please leave a comment below or use the “Contact” page above.
Posted in Chemistry, chemometrics, teaching | Leave a Comment »
Chemotherapy … NO! Chemometrics … YES!
Posted by sapidblog on 25 April, 2010
Owing to retirements at the end of last year, this semester I’ve picked up lectures on the subject of “Chemometrics.” In previous years I’ve overheard students talking about going to do “chemo.” The joke being that chemometrics lectures were so difficult and painful that those students compared them to enduring “chemotherapy” treatments. Truth is that I enjoyed the joke too (too much) – but now the jokes on me – as I’m the one giving those same chemo lectures.
I decided that what had to be done was to make chemo more active and get students involved with learning by being immersed in doing practical activities. One set of activities started last week was a moderate success: the previous week students had brought in tap water samples from home and analysed them by our inductively-coupled plasma – mass spectrometer (ICP-MS). Last week students used those same data to perform chemometrics analysis of Brisbane waters using chemometrics techniques [specifically principal component analysis (PCA) and hierarchical cluster analysis (HCA)] using the Matlab® EX builder for Microsoft Excel® as shown in the diagram below:
In the above diagram the ICP-MS waters data is in the format of samples in rows and elemental results in columns. The HCA add-in consists of the above VBA dialog control which provides refedit control elements for data entry using intuitive point, select and click methods. Options for selection of distance metrics and linkage methods are provided using drop-down lists and numerical output is presented in the spreadsheet in the form of a distance matrix and a binary linkage tree. The resulting dendrogram is presented as a chart in Matlab format as shown in the diagram below:
The Dendrogram above was obtained with standardised Euclidean distances and average linkages. It shows the tap water samples in red and the tank, reverse osmosis water, and bottled waters in black. The whole analysis is extremely rapid and simple to perform and only requires basic knowledge of Microsoft Excel®. ONCE IT IS SET UP!
The difficulty experienced was in setting up the add-ins an getting them to work reliably in a classroom situation. The add-in requires the Matlab compiler runtime (MCR) to be installed (this is essentially a stripped-down version of Matlab without any GUI, just the calculation engine). Then the add-in needs to be activated and running within Excel. The numerous steps involved all provide opportunities for things to go wrong and, unfortunately, they do go wrong quite often. So the students seem to appreciate the effort made in providing the software for them to use but have a fair bit of collective angst about getting it to work reliably. A lot of class time so far has been spent trouble shooting problems with the software rather than in actually doing chemometrics.
I’m embarassed to ask for help from the IT support people because I’m always working on such a last-minute basis. Also, I’m not sure that they’d like to see so much untested VBA code running around on their systems.
If you would like to know more about the use of the Matlab EX plugins described here then please leave a comment below or use the “Contact” page above.
Posted in Chemistry, chemometrics, eLearning, teaching | Leave a Comment »
Short-circuited Brains
Posted by sapidblog on 18 April, 2010
In chemical analysis using instruments usually involves preparing a set of standard solutions, measuring the instrument response and preparing a calibration plot of concentration versus instrument reading. The concentrations for the samples (and their associated uncertainties) can then be calculated from the regression equation for the calibration. The required equations can be found in textbooks such as Daniel C. Harris, “Quantitative Chemical Analysis” and are relatively straight forward to calculate, though with a calculator, rather tedious and prone to error because of the numerous steps involved and the need to store intermediate values in calculator memory. Using a spreadsheet simplifies the procedure, but the built-in Excel function LINEST is difficult to use and regression function provided with the Data Analysis ToolPak in Excel lacks the ability to calculate a concentration x given a y from the instrument reading (or to estimate the uncertainty for the concentration x).
So for the past few years I’ve provided a spreadsheet template with macros, the latest version of which is shown below. Before the spreadsheet template (pre-2005) many students were submitting unsatisfactory reports that neither contained regression calculations nor provided uncertainties. Having spent much of the current weekend marking reports (yet again!) the students now find it easy to complete the regression calculations and uncertainties. But because they find it easy they also forget to engage the brain and think about what it is that they’re doing with this spreadsheet and reflect upon the significance of the results produced: I think that I’ve created a tool that makes it possible for some students to short-circuit their brains.
The problems that were evident included:
- thinking that it was satisfactory just to print the spreadsheet and attach it to their report without critical comment or analysis
- not discussing selection of fitting options such as “fit-through-the origin”
- not being aware of what the estimates of uncertainty were or how to use them in subsequent calculations
Note added in proof (22nd April): as I marked more reports from students, the situation wasn’t as bad as I first thought. Approximately 30% of students did make some brain-short-circuiting mistakes of the kind mentioned above. But another 30% did provide more in-depth treatment of their results with no indication of brain-short-circuiting and the remainder were somewhere in-between. The situation is clearly superior to that of pre-2005 because, I think, the spreadsheet template has assisted in making the requirements more explicit to students. Previously, the requirements were implicit, with statements of the kind: “use the regression equations given in the textbook” without stating how this was to be performed or what level of detailed analysis was expected.
One of the limitations of the current approach in providing the above regression calibration template to students is the issue of ownership: are students using it because they think they need to in order to gain a favorable grade? Do they use it as a useful labour and time saving tool that helps them get on with the important aspect of critical discussion of the results? In the past, a number of students have gone on to Honours, Masters or PhD work and have commented on how useful they found earlier versions of the template. Some previous students have used in in later year units. But usually students seem to forget about it at the end of the examinations (like so many things).
Next semester I intend to work further on the issue of ownership by offering students the choice of using a manual or scientific calculator, spreadsheet calculations using {=LINEST} (similarly to the textbook) and the calibration template method above. Students are then responsible for making their own choices. I’m confident that this will remove a further layer of brain-short-circuiting by students. Hope to make another blog entry about it then.
If you would like to know more about the use of the spreadsheet template described here then please leave a comment below or use the “Contact” page above.
Posted in Chemistry, eLearning, teaching | Leave a Comment »
Microsoft Excel® ToolPak Confusion
Posted by sapidblog on 13 April, 2010
Marking, marking, marking all last weekend. I was marking a practical reports where the students were required to estimate the concentration of an acidic product using the class data set from 19 sets of titrations with (i) phenolphthalein and (ii) methyl orange as indicators. They were then required to compare mean concentrations derived using a t-test and the precisions of the means using a F-test. Microsoft Excel® spreadsheets using the Data Analysis ToolPak® were used to perform the statistical tests.
Every student in the class (about 40 students) became confused in one way or another by the use of the F-test, Two-Sample for Variances – results are displayed above. It appears that this confusion is mainly generated by the ToolPak itself for some of the following reasons:
- only the one-tail probabilities are displayed (both one- and two-tailed probabilities are given with the corresponding t-test tool)
- the initially selected alpha value is not displayed in the output (and easily lost track of)
- it is not at all clear what affect setting an alpha value will have in the final analysis results
After much confusion myself, I can now clarify at least some of these issues. Firstly, the one-tailed probability can be converted into a two-tailed probability by multiplying by a factor of 2, i.e., the p-value for the above results is 0.046 x 2 = 0.092 (two-tailed). This was the main source of confusion for students since if p = 0.046 this is less than α = 0.05 and the null hypothesis is rejected (implying that the precisions of the class titrations by the two indicators are different at the 95% confidence level). However, in this case a two-tailed probability should be used and then we have p = 0.092 > α = 0.05 and the null hypothesis should be accepted (implying no difference between the precisions).
The other issue leading to student confusion was about what value to use in the dialog box under the Alpha: entry box (see above diagram). If it is left at the default entry of 0.05 then the F-value calculated is for a one-tailed distribution with α =0.05, Fcrit = 2.22 which is less than Ftest = 2.26 leading to the same result as above: rejecting the null hypothesis. However, to obtain the two-tailed value for Fcrit we need to enter an Alpha: at half of 0.05 or 0.025. Then we obtain the two-tailed value for Fcrit = 2.60 which is greater than Ftest = 2.26 leading to the result that the null hypothesis should be accepted and that there is no difference in precision of the titrations for either phenolphthalein or methyl orange at the 95% confidence level.
Confused? Lost? Then you can understand the position that my students were in.
Given the stunning lack of success with my class I’m certainly tempted to not use the Excel® Data Analysis ToolPak ever again. But then again, since Excel® is a major tool used by industries that my students might find jobs with later on I’m pretty much stuck with it. At least for next semester I’ll have better instructions ready.
I’ve been using this practical excercise for 2 semesters a year since about 2004 and I’ve never encountered this situation before. It seems that this group is the most accomplished with performing titrations that I’ve ever encountered. No other class has ever come so close before to finding the indicator error from using these two indicators. In the past they found p-values that were always much greater than 0.05 and the issue of whether to perform a one-tailed or two-tailed test simply never arose.
I guess no good work ever goes unpunished.
The two-tailed test is most appropriate here since there is no a priori reason to confine the test to one side or the other of the distribution.
Posted in Chemistry, teaching | Leave a Comment »








