LipidTrend 0.99.2
LipidTrend
is an R package designed to identify statistically significant
differences in lipidomic feature-level trends between groups. It supports both
one-dimensional and two-dimensional analyses of continuous lipid features (e.g.,
chain length, double bond count).
The package includes three main functions:
In addition to these core functions, several helper functions are available to
facilitate the exploration and extraction of results from the returned
LipidTrendSE
object.
For more details, please refer to the Helper Functions section.
To install LipidTrend
, ensure that you have R 4.5.0 or later installed
(see the R Project at http://www.r-project.org)
and are familiar with its usage.
LipidTrend
package is available on Bioconductor repository
http://www.bioconductor.org.
Before installing LipidTrend
, you must first install the core Bioconductor
packages. If you have already installed them, you can skip the following step.
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install()
Once the core Bioconductor packages are installed, you can proceed with
installing LipidTrend
.
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("LipidTrend")
After the installation is complete, you’re ready to start using LipidTrend
.
Now, let’s load the package first.
library(LipidTrend)
LipidTrend
requires a SummarizedExperiment object as input data. It must
contain the following components:
Assay: A numeric matrix representing lipid abundance values, where each row corresponds to a lipid species and each column to a sample. Please ensure the values meet the following requirements:
LipidTrend
, the abundance data
must undergo preprocessing to address missing or noisy values. This
preprocessing should include filtering, imputation, and normalization.RowData: A data frame containing lipid feature information (e.g., double bond count, chain length, or other continuous variables), where each row corresponds to a lipid species and each column to a specific lipid feature.
ColData: A data frame containing metadata for each sample.
sample_name
: A unique identifier for each sample.label_name
: A display name used for plotting or grouping.group
: The experimental condition or biological group associated
with each sample.If you are already familiar with constructing a SummarizedExperiment object, you can skip the following section. Otherwise, refer to the example in the rest of this section to learn how to build a SummarizedExperiment object.
The abundance data is a matrix containing lipid abundance values across lipids and samples, where rows represent lipids and columns represent samples.
# load example abundance data
data("abundance_2D")
# view abundance data
head(abundance_2D, 5)
#> HSD17B12KO01 HSD17B12KO02 HSD17B12KO03 sgCtrl01 sgCtrl02
#> TG 33:1 0.7591750 0.3109753 0.4456624 0.008158902 0.04569340
#> TG 36:1 3.2885277 2.7623366 3.0669865 0.193283659 0.25024117
#> TG 36:2 1.9974341 1.3529295 1.5173635 0.187196921 0.28231711
#> TG 37:2 1.7893400 0.8872451 1.0449693 0.121982450 0.71770893
#> TG 38:0 0.3413125 0.3504082 0.3784324 0.031840554 0.04336764
#> sgCtrl03
#> TG 33:1 0.08986397
#> TG 36:1 0.36918016
#> TG 36:2 0.44319480
#> TG 37:2 1.13574059
#> TG 38:0 0.05876135
The lipid characteristic table is a data frame containing information about each lipid’s characteristics, such as the number of double bonds and chain length. The order of the lipids in this table must align with the abundance data.
A one-dimensional analysis will be conducted if the table has only one column, and a two-dimensional analysis will be performed if it contains two columns. The table can only have a maximum of two columns.
In this example, we use data suitable for a two-dimensional analysis.
# load example lipid characteristic table (2D)
data("char_table_2D")
# view lipid characteristic table
head(char_table_2D, 5)
#> Total.C Total.DB
#> TG 33:1 33 1
#> TG 36:1 36 1
#> TG 36:2 36 2
#> TG 37:2 37 2
#> TG 38:0 38 0
The group information table is a data frame containing grouping details corresponding to the samples in the lipid abundance data. It must adhere to the following requirements:
sample_name
,
label_name
, and group
.sample_name
column must match those in the lipid
abundance data.sample_name
, label_name
, and group
must not contain
missing values (NA).For example:
# load example group information table
data("group_info")
# view group information table
group_info
#> sample_name label_name group
#> 1 HSD17B12KO01 HSD17B12KO01 HSD17B12KO
#> 2 HSD17B12KO02 HSD17B12KO02 HSD17B12KO
#> 3 HSD17B12KO03 HSD17B12KO03 HSD17B12KO
#> 4 sgCtrl01 sgCtrl01 sgCtrl
#> 5 sgCtrl02 sgCtrl02 sgCtrl
#> 6 sgCtrl03 sgCtrl03 sgCtrl
Once the abundance data, lipid characteristic table, and group information
table are prepared, we can construct the input SummarizedExperiment object.
We will use the SummarizedExperiment
function from
SummarizedExperiment.
Follow the command below to create this object.
se_2D <- SummarizedExperiment::SummarizedExperiment(
assays=list(abundance=abundance_2D),
rowData=S4Vectors::DataFrame(char_table_2D),
colData=S4Vectors::DataFrame(group_info))
LipidTrend
workflowThe LipidTrend
workflow starts with a SummarizedExperiment object as input.
It supports both one-dimensional (1D) and two-dimensional (2D) lipid
features analyses.
Based on the number of feature columns provided in the rowData
(e.g.,
chain length, double bond count), the function automatically performs
either 1D or 2D trend detection.
After statistical computation and visualization, the workflow returns:
This streamlined workflow enables researchers to identify structured lipidomic patterns across feature dimensions with statistical rigor and biological interpretability.
We recommend using the set.seed()
function before starting to ensure
stability in the permutation process during computation.
set.seed(1234)
One-dimensional analysis is applied when the input dataset contains a single continuous lipid feature, such as chain length or double bond count. This approach is ideal when the biological question centers on one specific biochemical property of lipids or when only one type of feature annotation is available.
Compared to two-dimensional analysis, the one-dimensional approach is more straightforward to interpret and requires less data completeness. It is particularly suitable in the following scenarios:
To begin, we will first examine the structure of the example input data to ensure it is correctly formatted for one-dimensional analysis.
# load example data
data("lipid_se_CL")
# quick look of SE structure
show(lipid_se_CL)
#> class: SummarizedExperiment
#> dim: 29 6
#> metadata(0):
#> assays(1): abundance
#> rownames(29): 33 36 ... 62 64
#> rowData names(1): chain
#> colnames: NULL
#> colData names(3): sample_name label_name group
Overview of Region-Based Trend Analysis
The analyzeLipidRegion()
function performs region-based statistical analysis
of lipidomic features, integrating both marginal testing and permutation-based
smoothed testing to identify meaningful trends across continuous lipid features
(e.g., chain length, double bond).
This two-stage approach enhances statistical power and robustness, especially in small-sample datasets:
Marginal Test: Each lipid feature is first tested individually using either a t-test (with glog10 transformation) or a Wilcoxon test. This step yields a marginal statistic and a corresponding marginal p-value for each feature.
Region-Based Permutation Test with Smoothing: The resulting vector of marginal statistics is then smoothed using a Gaussian kernel, which integrates information from neighboring lipid features based on their similarity (e.g., proximity in chain length). A smoothed statistic is computed for each lipid, and an empirical p-value is derived via permutation testing by comparing the observed statistic to a null distribution.
Note:
- If test=t.test
(default), abundance values are internally transformed using
glog10 transformation before testing.
- If test=Wilcoxon
test, due to its higher computational cost during repeated
permutations, it is recommended to set permute_time to fewer than 10,000
to maintain a reasonable runtime.
Split-Chain Analysis for Chain Length Features
To enhance the biological interpretability of chain length–related trends,
split_chain
provides an option to analyze even-chain and odd-chain lipids
via the split_chain
parameter separately.
Enabling this option allows the function to separate lipids based on chain length parity (even vs. odd) and perform region-based statistical testing independently for each group. This is particularly beneficial when distinct biosynthetic or regulatory patterns are expected between even- and odd-chain lipid species.
To activate this feature, configure the split_chain
and chain_col
parameters as follows:
split_chain=TRUE
:
chain_col
parameter.split_chain=FALSE
(default):
chain_col=NULL
.Recommendation:
Set split_chain=TRUE
when analyzing features such as chain length, as it
often leads to more meaningful biological insights.
Note:
- If fewer than two lipids are present in either the even or odd group,
analysis for that group will be skipped, and a warning will be issued.
Abundance-Weighted vs. Unweighted Statistics
To reflect the biological importance of more abundant lipids, we provide an
option controlled by the abund_weight
parameter, which allows for weighting
region statistics based on the average abundance of each lipid species.
abund_weight=TRUE
(default), the marginal test statistic is scaled by
each lipid’s normalized average abundance during the smoothing step. This
emphasizes biologically dominant signals while down-weighting low-abundance,
potentially noisy lipids.abund_weight=FALSE
, all lipids are treated equally, regardless of
their abundance. In this case, the region statistic is calculated solely based
on test statistics and feature similarity.This flexibility allows users to choose between:
Note:
- Abundance weighting is applied only during the smoothing and permutation
steps; it does not affect the initial marginal testing.
# run analyzeLipidRegion
res1D <- analyzeLipidRegion(
lipid_se_CL, ref_group="sgCtrl", split_chain=TRUE,
chain_col="chain", radius=3, own_contri=0.5, test="t.test",
abund_weight=TRUE, permute_time=1000)
# view result summary
show(res1D)
#> class: LipidTrendSE
#> dim: 29 6
#> metadata(0):
#> assays(1): abundance
#> rownames(29): 33 36 ... 62 64
#> rowData names(1): chain
#> colnames: NULL
#> colData names(3): sample_name label_name group
#>
#> LipidTrend Results:
#> ------------------------
#> Split chain analysis: Yes
#> Even chain result: 15 features
#> Odd chain result: 14 features
The analyzeLipidRegion()
function produces an extended SummarizedExperiment
object called LipidTrendSE
. To facilitate result extraction, we offer several
helper functions that make viewing the resulting data frame easier. For more
information on these helper functions, please refer to
Helper Functions section.
# view even chain result (first 5 lines)
head(even_chain_result(res1D), 5)
#> chain avg.abund.ctrl avg.abund.case direction smoothing.pval.BH
#> 36 36 0.5751379 4.661859 + 0.001071429
#> 38 38 0.7665390 3.708691 + 0.001071429
#> 40 40 0.9500241 6.091883 + 0.001071429
#> 42 42 4.4355047 20.377929 + 0.001071429
#> 44 44 19.1928487 61.133996 + 0.001071429
#> marginal.pval.BH log2.FC significance
#> 36 0.0003639142 3.018926 Increase
#> 38 0.0007890215 2.274479 Increase
#> 40 0.0002324393 2.680852 Increase
#> 42 0.0004215113 2.199837 Increase
#> 44 0.0005093479 1.671406 Increase
# view odd chain result (first 5 lines)
head(odd_chain_result(res1D), 5)
#> chain avg.abund.ctrl avg.abund.case direction smoothing.pval.BH
#> 33 33 0.04790542 0.5052709 + 0.001
#> 37 37 0.65847733 1.2405181 + 0.001
#> 39 39 0.11565136 1.2085836 + 0.001
#> 41 41 0.41424681 4.3259329 + 0.001
#> 43 43 1.80730482 14.2470104 + 0.001
#> marginal.pval.BH log2.FC significance
#> 33 2.565158e-02 3.3987963 Increase
#> 37 2.233191e-01 0.9137372 Increase
#> 39 5.134067e-05 3.3854631 Increase
#> 41 1.591450e-04 3.3844488 Increase
#> 43 1.050032e-03 2.9787475 Increase
This section demonstrates how to visualize the results from the LipidTrendSE
object returned by the analyzeLipidRegion()
function.
Note:
- If split_chain=TRUE
was set in analyzeLipidRegion()
, two separate plots
will be generated: one for even-chain lipids, and one for odd-chain lipids.
- If split_chain=FALSE
, only a single plot will be returned, showing all
lipids together.
# plot result
plots <- plotRegion1D(res1D, p_cutoff=0.05, y_scale='identity')
# even chain result
plots$even_result
# odd chain result
plots$odd_result
The visualization illustrates lipid trends and includes the following components:
Color Interpretation:
These visualizations highlight not only the magnitude of lipid abundance changes but also the specific feature-level regions (e.g., chain length) where group differences are most pronounced.
Two-dimensional analysis is applicable when the input dataset includes two continuous lipid features, such as chain length, double bond count, or other numeric lipid characteristics. Compared to one-dimensional analysis, 2D analysis enables the detection of more complex patterns by simultaneously evaluating lipid trends across two biochemical axes.
This method is particularly well-suited for the following scenarios:
Two-dimensional analysis provides a high-resolution map of lipid changes, allowing for the identification of specific combinations of features (e.g., long-chain saturated vs. short-chain unsaturated lipids) that may i ndicate pathway-level regulation.
Let’s now take a quick look at the structure of the example input data used for 2D analysis.
# load example data
data("lipid_se_2D")
# quick look of SE structure
show(lipid_se_2D)
#> class: SummarizedExperiment
#> dim: 137 6
#> metadata(0):
#> assays(1): abundance
#> rownames(137): TG 33:1 TG 36:1 ... TG 64:5 TG 64:8
#> rowData names(2): Total.C Total.DB
#> colnames: NULL
#> colData names(3): sample_name label_name group
Overview of Region-Based Trend Analysis
The analyzeLipidRegion()
function performs region-based statistical analysis
of lipidomic features, integrating both marginal testing and permutation-based
smoothed testing to identify meaningful trends across continuous lipid features
(e.g., chain length, double bond).
This two-stage approach enhances statistical power and robustness, especially in small-sample datasets:
Marginal Test: Each lipid feature is first tested individually using either a t-test (with glog10 transformation) or a Wilcoxon test. This step yields a marginal statistic and a corresponding marginal p-value for each feature.
Region-Based Permutation Test with Smoothing: The resulting vector of marginal statistics is then smoothed using a Gaussian kernel, which integrates information from neighboring lipid features based on their similarity (e.g., proximity in chain length). A smoothed statistic is computed for each lipid, and an empirical p-value is derived via permutation testing by comparing the observed statistic to a null distribution.
Note:
- If test=t.test
(default), abundance values are internally transformed using
glog10 transformation before testing.
- If test=Wilcoxon
test, due to its higher computational cost during repeated
permutations, it is recommended to set permute_time to fewer than 10,000
to maintain a reasonable runtime.
Split-Chain Analysis for Chain Length Features
To enhance the biological interpretability of chain length–related trends,
split_chain
provides an option to analyze even-chain and odd-chain lipids
via the split_chain
parameter separately.
Enabling this option allows the function to separate lipids based on chain length parity (even vs. odd) and perform region-based statistical testing independently for each group. This is particularly beneficial when distinct biosynthetic or regulatory patterns are expected between even- and odd-chain lipid species.
To activate this feature, configure the split_chain
and chain_col
parameters as follows:
split_chain=TRUE
:
chain_col
parameter.split_chain=FALSE
(default):
chain_col=NULL
.Recommendation:
Set split_chain=TRUE
when analyzing features such as chain length, as it
often leads to more meaningful biological insights.
Note:
- If fewer than two lipids are present in either the even or odd group,
analysis for that group will be skipped, and a warning will be issued.
Abundance-Weighted vs. Unweighted Statistics
To reflect the biological importance of more abundant lipids, we provide an
option controlled by the abund_weight
parameter, which allows for weighting
region statistics based on the average abundance of each lipid species.
abund_weight=TRUE
(default), the marginal test statistic is scaled by
each lipid’s normalized average abundance during the smoothing step. This
emphasizes biologically dominant signals while down-weighting low-abundance,
potentially noisy lipids.abund_weight=FALSE
, all lipids are treated equally, regardless of
their abundance. In this case, the region statistic is calculated solely based
on test statistics and feature similarity.This flexibility allows users to choose between:
Note:
- Abundance weighting is applied only during the smoothing and permutation
steps; it does not affect the initial marginal testing.
# run analyzeLipidRegion
res2D <- analyzeLipidRegion(
lipid_se_2D, ref_group="sgCtrl", split_chain=TRUE,
chain_col="Total.C", radius=3, own_contri=0.5, test="t.test",
abund_weight=TRUE, permute_time=1000)
# view result summary
show(res2D)
#> class: LipidTrendSE
#> dim: 137 6
#> metadata(0):
#> assays(1): abundance
#> rownames(137): TG 33:1 TG 36:1 ... TG 64:5 TG 64:8
#> rowData names(2): Total.C Total.DB
#> colnames: NULL
#> colData names(3): sample_name label_name group
#>
#> LipidTrend Results:
#> ------------------------
#> Split chain analysis: Yes
#> Even chain result: 81 features
#> Odd chain result: 56 features
The analyzeLipidRegion()
function produces an extended SummarizedExperiment
object called LipidTrendSE
. To facilitate result extraction, we offer several
helper functions that make viewing the resulting data frame easier. For more
information on these helper functions, please refer to
Helper Functions section.
# view even chain result (first 5 lines)
head(even_chain_result(res2D), 5)
#> Total.C Total.DB avg.abund direction smoothing.pval.BH marginal.pval.BH
#> TG 36:1 36 1 1.6550926 + 0.001038462 7.707795e-05
#> TG 36:2 36 2 0.9634060 + 0.001038462 1.839016e-03
#> TG 38:0 38 0 0.2006871 + 0.001038462 7.391351e-05
#> TG 38:1 38 1 0.8449176 + 0.001038462 1.052245e-04
#> TG 38:2 38 2 0.9041542 + 0.001038462 2.100744e-03
#> log2.FC significance
#> TG 36:1 3.487890 Increase
#> TG 36:2 2.415022 Increase
#> TG 38:0 2.997840 Increase
#> TG 38:1 2.970089 Increase
#> TG 38:2 1.708606 Increase
# view odd chain result (first 5 lines)
head(odd_chain_result(res2D), 5)
#> Total.C Total.DB avg.abund direction smoothing.pval.BH marginal.pval.BH
#> TG 33:1 33 1 0.2765882 + 0.001056604 0.0232316173
#> TG 37:2 37 2 0.9494977 + 0.103854545 0.2273794624
#> TG 39:0 39 0 0.1750303 + 0.001056604 0.0001638433
#> TG 39:1 39 1 0.4870872 + 0.001056604 0.0001274996
#> TG 41:0 41 0 0.5079328 + 0.001056604 0.0001506772
#> log2.FC significance
#> TG 33:1 3.3987963 Increase
#> TG 37:2 0.9137372 NS
#> TG 39:0 3.8873738 Increase
#> TG 39:1 3.2356822 Increase
#> TG 41:0 2.8007824 Increase
This section demonstrates how to visualize the results from the LipidTrendSE
object returned by the analyzeLipidRegion()
function.
Note:
- If split_chain=TRUE
was set in analyzeLipidRegion()
, two separate plots
will be generated: one for even-chain lipids, and one for odd-chain lipids.
- If split_chain=FALSE
, only a single plot will be returned, showing all
lipids together.
# plot result
plot2D <- plotRegion2D(res2D, p_cutoff=0.05)
# even chain result
plot2D$even_result
# odd chain result
plot2D$odd_result
This plot visualizes two-dimensional lipid features, highlighting regional trends and significant differences between case and control groups.
The X- and Y-axes represent two continuous lipid characteristics (e.g., total chain length and double bond count). Each point corresponds to a lipid, with the color indicating its log₂ fold-change (log₂FC) between case and control groups: * Red points indicate higher abundance in case samples. * Blue points indicate higher abundance in control samples.
If abund_weight=TRUE
, point size reflects the mean abundance of each lipid.
If abund_weight=FALSE
, all points are displayed with equal size.
Asterisks (*, **, ***) denote levels of statistical significance based on the marginal test, with more asterisks representing stronger evidence.
Colored outlines (red or blue) represent significant regions identified by the smoothed region-based permutation test, which incorporates information from neighboring lipids with similar feature values: * Red regions indicate a significant trend of increasing abundance in case samples. * Blue regions indicate a significant trend of decreasing abundance in case samples.
This visualization enables detection of both individual lipid-level changes and region-level patterns, providing biologically meaningful trends across two lipid features.
LipidTrend
provides 4 helper functions to enhance the viewing of the
LipidTrendSE
object returned by analyzeLipidRegion()
:
result()
– Returns the result data frame.even_chain_result()
– Returns the even-chain result data frame.odd_chain_result()
– Returns the odd-chain result data frame.show()
– Displays a summary of the LipidTrendSE
object.Notes:
- If split_chain=TRUE
, use even_chain_result()
and odd_chain_result()
to
view the results separately. Otherwise, use result()
.
- To extract assay
, rowData
, or colData
from the LipidTrendSE
object,
use functions from the SummarizedExperiment package.
The result table contains the following columns:
rowData
, such as chain length or double bond count. Column names vary
depending on the input dataset.#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] LipidTrend_0.99.2 BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4
#> [3] SparseArray_1.9.1 robustbase_0.99-4-1
#> [5] lattice_0.22-7 digest_0.6.37
#> [7] magrittr_2.0.3 MKmisc_1.9
#> [9] evaluate_1.0.4 grid_4.5.1
#> [11] RColorBrewer_1.1-3 bookdown_0.43
#> [13] fastmap_1.2.0 Matrix_1.7-3
#> [15] jsonlite_2.0.0 ggnewscale_0.5.2
#> [17] limma_3.65.3 tinytex_0.57
#> [19] BiocManager_1.30.26 scales_1.4.0
#> [21] jquerylib_0.1.4 abind_1.4-8
#> [23] cli_3.6.5 crayon_1.5.3
#> [25] rlang_1.1.6 XVector_0.49.0
#> [27] Biobase_2.69.0 withr_3.0.2
#> [29] DelayedArray_0.35.2 cachem_1.1.0
#> [31] yaml_2.3.10 S4Arrays_1.9.1
#> [33] tools_4.5.1 dplyr_1.1.4
#> [35] ggplot2_3.5.2 matrixTests_0.2.3
#> [37] SummarizedExperiment_1.39.1 BiocGenerics_0.55.1
#> [39] vctrs_0.6.5 R6_2.6.1
#> [41] magick_2.8.7 matrixStats_1.5.0
#> [43] stats4_4.5.1 lifecycle_1.0.4
#> [45] Seqinfo_0.99.2 S4Vectors_0.47.0
#> [47] IRanges_2.43.0 pkgconfig_2.0.3
#> [49] pillar_1.11.0 bslib_0.9.0
#> [51] gtable_0.3.6 Rcpp_1.1.0
#> [53] glue_1.8.0 statmod_1.5.0
#> [55] DEoptimR_1.1-4 xfun_0.53
#> [57] tibble_3.3.0 GenomicRanges_1.61.1
#> [59] tidyselect_1.2.1 MatrixGenerics_1.21.0
#> [61] knitr_1.50 dichromat_2.0-0.1
#> [63] farver_2.1.2 htmltools_0.5.8.1
#> [65] labeling_0.4.3 rmarkdown_2.29
#> [67] compiler_4.5.1