Duplicates in data and access online in R

Question from user:

I am using the platform to download protein data. Currently I am focused on using the MBAA datasets but will eventually try and combine with gene expression, HAI and more. When downloading the MBAA data I have noticed that there are many duplicates/triplicates in the data. It appears the numbers are different in each of them but are similar. An example is in study 202, Sub120321 at day 0 has two CXCL5 measurements of 46 and 47. So, my best guess is that these are samples that were run in duplicate/triplicate in the assay. However, I can’t see any metadata to confirm this. Furthermore, in that same study I have samples that were not in duplicate, hence my confusion.

I have also observed a similar story in ELISA and ELISPOT data but with very different values. Would I be safe to assume they are technical replicates and collapse into an average?

In a second issue I am struggling to connect with the database in R. please see my github post with all the info.


Please see below for the netrc issue mentioned in the github ticket:

Also, regarding the data issue that you noted with two multiplex bead array assay data points for the same biological sample, your assumption appears to be correct in that these are replicates. If you toggle the ‘full’ view on this dataset, you will see that both rows include the same biological sample id:


Also their ‘source accessions’, which in this case corresponds to an ‘experiment sample’, differ. This indicates that the same biological sample was divided into multiple ‘experiment samples’ and assayed. I do not believe that there are any other metadata regarding the study design that would convey this information on the legacy site (datatools.immunespace.org). Ensure that your ‘.netrc’ file is updated with ‘datatools.immunespace.org’ as the ‘machine’. However, on the new site (immunespace.org), we are working to include a study design viewer that would convey this type of information graphically and descriptively. It’s a work in progress, but we expect to be able to push it to our production systems in the first half of 2024.