Tools Based Categorical Functions
Tools Based Categorical Functions
Array UNIQUE
Array FILTER
Array SEQUENCE
Date & Time NOW
Date & Time TODAY
Date & Time DATEDIF
Financial PMT
Logical IF
Logical IFERROR
Logical ISNUMBER
Logical ISTEXT
Lookup & Reference VLOOKUP
Lookup & Reference HLOOKUP
Lookup & Reference INDEX
Lookup & Reference MATCH
Lookup & Reference XLOOKUP
Math & Trigonometry SUM
Math & Trigonometry SUMIF
Math & Trigonometry SUMPRODUCT
Math & Trigonometry ROUND
Math & Trigonometry ROUNDUP
Math & Trigonometry ROUNDDOWN
Math & Trigonometry RANDBETWEEN
Statistical AVERAGE
Statistical COUNTIF
Statistical LARGE
Statistical SMALL
Text CONCATENATE
Text LEFT
Text RIGHT
Text MID
Text LEN
Text TRIM
Text UPPER
Text LOWER
Text PROPER
Text TEXT
Text TEXTJOIN
Text SUBSTITUTE
Text SEARCH
Text CHAR
Text CODE
Syntax (FORMULATEXT)
=FORMULATEXT(UNIQUE(A1:A10))
=FORMULATEXT(FILTER(A1:A10, A1:A10>50))
=FORMULATEXT(SEQUENCE(5, 1, 1, 1))
=FORMULATEXT(NOW())
=FORMULATEXT(TODAY())
=FORMULATEXT(DATEDIF(A1, B1, "d"))
=FORMULATEXT(PMT(5%/12, 60, -10000))
FORMULATEXT(IF(A1>50, "Pass", "Fail"))
=FORMULATEXT(IFERROR(A1/B1, "Error"))
=FORMULATEXT(ISNUMBER(A1))
=FORMULATEXT(ISTEXT(A1))
=FORMULATEXT(VLOOKUP(101, A2:C10, 2, FALSE))
=FORMULATEXT(HLOOKUP("Jan", A1:G3, 2, TRUE))
FORMULATEXT(INDEX(A1:C10, 3, 2))
=FORMULATEXT(MATCH(50, A1:A10, 0))
FORMULATEXT(XLOOKUP(101, A1:A10, B1:B10))
=FORMULATEXT(SUM(A1:A10))
=FORMULATEXT(SUMIF(A1:A10, ">50", B1:B10))
=FORMULATEXT(SUMPRODUCT(A1:A10, B1:B10))
=FORMULATEXT(ROUND(A1, 2))
=FORMULATEXT(ROUNDUP(A1, 2))
=FORMULATEXT(ROUNDDOWN(A1, 2))
=FORMULATEXT(RANDBETWEEN(1, 100))
=FORMULATEXT(AVERAGE(B1:B10))
=FORMULATEXT(COUNTIF(A1:A10, ">50"))
=FORMULATEXT(LARGE(A1:A10, 2))
=FORMULATEXT(SMALL(A1:A10, 2))
=FORMULATEXT(CONCATENATE(A1, " ", B1))
=FORMULATEXT(LEFT(A1, 5))
=FORMULATEXT(RIGHT(A1, 4))
=FORMULATEXT(MID(A1, 3, 4))
=FORMULATEXT(LEN(A1))
=FORMULATEXT(TRIM(A1))
=FORMULATEXT(UPPER(A1))
=FORMULATEXT(LOWER(A1))
=FORMULATEXT(PROPER(A1))
=FORMULATEXT(TEXT(A1, "MM/DD/YYYY"))
=FORMULATEXT(TEXTJOIN(", ", TRUE, A1:A5))
=FORMULATEXT(SUBSTITUTE(A1, "old", "new"))
=FORMULATEXT(SEARCH("cat", A1))
=FORMULATEXT(CHAR(65))
=FORMULATEXT(CODE(A1))
Explanation
Returns a list of unique values from a range or array.
Returns an array that meets a specified condition.
Generates an array of sequential numbers.
Returns the current date and time.
Returns the current date without the time.
Calculates the difference between two dates in days, months, or years.
Calculates the payment for a loan based on constant interest rate and payments.
Returns one value if a condition is TRUE and another if FALSE.
Returns a custom value if a formula results in an error; otherwise, returns the formula result.
Checks if a value is numeric, returning TRUE or FALSE.
Checks if a value is text, returning TRUE or FALSE.
Looks up a value in the first column and retrieves data from another column in the same row.
Searches for a value in the first row and retrieves data from the same column in another row.
Returns the value at a specific row and column intersection in a range.
Finds the position of a value in a range.
Searches a range or array and returns the corresponding value, with more flexibility than VLOOKUP.
Adds all the numbers in a range of cells.
Adds the values in a range that meet a specified condition.
Multiplies corresponding elements of arrays and sums the results.
Rounds a number to a specified number of decimal places.
Rounds a number up to a specified number of decimal places.
Rounds a number down to a specified number of decimal places.
Returns a random integer between two specified values.
Calculates the average (arithmetic mean) of a range of numbers.
Counts the number of cells in a range that meet a specific condition.
Returns the kth largest value in a data set.
Returns the kth smallest value in a data set.
Combines multiple text strings into one.
Extracts the first n characters from a text string.
Extracts the last n characters from a text string.
Extracts a substring starting at a specified position and length.
Returns the number of characters in a text string (including spaces).
Removes all extra spaces from text, leaving only single spaces.
Converts all text to uppercase.
Converts all text to lowercase.
Capitalizes the first letter of each word in a text string.
Formats a value and converts it to text according to a specific format.
Combines text from multiple cells with a specified delimiter.
Replaces occurrences of a specified text in a cell with another text.
Finds the position of a substring within a text string (case-insensitive).
Returns the character corresponding to a specified ASCII code.
Returns the ASCII code of the first character in a text string.
When to Use
To extract distinct values from datasets.
To filter data dynamically based on conditions.
To create automatic lists or indices for data processing.
To generate timestamps or display current system time.
To display or calculate with today’s date, such as for deadlines or schedules.
To measure durations, like age or project timelines.
To calculate monthly payments for loans or mortgages.
To categorize data based on conditions, like grading scores.
To handle errors like #DIV/0! in calculations gracefully.
To validate data types, such as numeric-only entries.
To validate if cells contain text instead of numbers.
To search for specific data, such as finding a name by ID.
To retrieve data from horizontally arranged tables.
To extract data based on position.
To locate the rank or position of a specific value.
To perform lookups without limitations of sorted data or column positions.
To calculate totals, like sales, expenses, or scores.
To sum sales or values that meet criteria, like region-specific totals.
To calculate weighted totals, like revenue = quantity × price.
To format numbers for consistent presentation or calculations.
To ensure results are rounded upwards, like for financial or production estimates.
To ensure results are rounded downwards.
To generate random numbers for testing or simulations.
To find average scores, prices, or performance metrics.
To count how many entries exceed a threshold or match a value.
To find top performers, like the 2nd highest sales or scores.
To find the lowest performers or smallest values in data.
To merge names, addresses, or other text into a single cell.
To extract prefixes, codes, or initials.
To retrieve suffixes, codes, or extensions.
To retrieve specific parts of a string, like codes or values.
To validate text length for entries like IDs or passwords.
To clean messy or improperly formatted data.
To standardize text to uppercase, like names or codes.
To standardize text to lowercase, like email addresses.
To format text like names or titles.
To format numbers, dates, or times in a desired layout.
To merge data like names or addresses into a single cell.
To clean or modify text data systematically.
To locate specific words or characters in text data.
To insert special characters using their ASCII codes.
To convert characters into their numeric ASCII codes.
Common Errors
Only works in newer versions of Excel.
Returns #CALC! if no match is found.
Only works in newer versions of Excel.
Updates dynamically.
Updates dynamically.
May give errors for invalid units (e.g., "m").
Errors if incorrect signs are used for inputs.
Errors if condition isn’t valid.
Doesn’t handle all error types.
Returns FALSE for blank cells.
Returns FALSE for blank cells.
Returns #N/A if value isn’t found.
Returns #N/A if value isn’t found.
Errors if row/column is out of bounds.
Returns #N/A if value isn’t found.
Returns #N/A if value isn’t found.
Non-numeric values are ignored.
Returns 0 if no cells match criteria.
Errors if ranges are unequal in size.
Rounds to the nearest value.
Always rounds up.
Always rounds down.
Changes value with every calculation refresh.
Non-numeric cells are ignored.
Returns 0 if no cells match criteria.
Returns #NUM! if k is invalid.
Returns #NUM! if k is invalid.
Replaced by CONCAT in newer versions.
Errors if num_chars is negative.
Errors if num_chars is negative.
Errors if start is beyond string length.
Works with spaces and special characters.
Doesn’t handle non-breaking spaces.
Works only with text.
Works only with text.
Doesn’t work well with abbreviations.
Errors if format code is invalid.
Skips empty cells if specified.
Case-sensitive by default.
Returns #VALUE! if not found.
Invalid codes return errors.
Returns #VALUE! if cell is empty.
Output Example
Array of unique values.
Filtered array > 50.
Array of numbers: 1, 2, 3, etc.
Current date and time.
Current date.
Difference in days.
Monthly payment amount.
"Pass" or "Fail" based on A1.
"Error" if B1 is 0.
TRUE or FALSE.
TRUE or FALSE.
Corresponding value from column 2.
Value from row 2 under "Jan".
Value in the 3rd row, 2nd column.
Position of 50 in A1:A10.
Corresponding value from B1:B10.
Total of A1 to A10.
Sum of B1:B10 where A1:A10 > 50.
Weighted sum of two ranges.
A1 rounded to 2 decimals.
A1 rounded up to 2 decimals.
A1 rounded down to 2 decimals.
Random number between 1 and 100.
Average of B1 to B10.
Count of cells > 50.
2nd largest value in A1:A10.
2nd smallest value in A1:A10.
Merged text, e.g., "John Smith".
First 5 characters of A1.
Last 4 characters of A1.
4 characters from position 3.
Length of A1.
Cleaned text without extra spaces.
Uppercase version of A1.
Lowercase version of A1.
Title-case version of A1.
A1 formatted as "MM/DD/YYYY".
Joined text with commas.
Modified text in A1.
Position of "cat" in A1.
Corresponding character, "A".
ASCII code of A1's first character.
Category Function Syntax (np.<function>)
Aggregation np.sum np.sum(array)
Aggregation np.mean np.mean(array)
Aggregation np.max np.max(array)
Aggregation np.min np.min(array)
Aggregation np.median np.median(array)
Aggregation np.cumsum np.cumsum(array, axis=None)
Aggregation np.cumprod np.cumprod(array, axis=None)
Array Creation np.zeros np.zeros(shape)
Array Creation np.ones np.ones(shape)
Array Creation np.eye np.eye(N, M=None, k=0)
Array Creation np.full np.full(shape, fill_value)
Array Manipulation np.concatenate np.concatenate((array1, array2), axis)
Array Manipulation np.diff np.diff(array, n=1, axis=-1)
Array Manipulation np.clip np.clip(array, a_min, a_max)
Array Manipulation np.sort np.sort(array, axis=-1)
Array Manipulation np.argsort np.argsort(array, axis=-1)
Conditional np.where np.where(condition, x, y)
Conditional np.nonzero np.nonzero(array)
Creation np.arange np.arange(start, stop, step)
Creation np.linspace np.linspace(start, stop, num)
Linear Algebra np.dot np.dot(a, b)
Linear Algebra np.linalg.inv np.linalg.inv(matrix)
Logical np.isnan np.isnan(array)
Logical np.isfinite np.isfinite(array)
Mathematical np.sqrt np.sqrt(array)
Mathematical np.exp np.exp(array)
Mathematical np.log np.log(array)
Mathematical np.power np.power(array, exponent)
Random np.random.rand np.random.rand(d0, d1, ...)
Random np.random.randint np.random.randint(low, high, size)
Set Operations np.unique np.unique(array)
Shape Manipulation np.reshape np.reshape(array, shape)
Shape Manipulation np.transpose np.transpose(array)
Shape Manipulation np.ravel np.ravel(array)
Statistical np.std np.std(array)
Statistical np.var np.var(array)
Usage Example
np.sum([1, 2, 3, 4])
np.mean([1, 2, 3, 4])
np.max([1, 2, 3, 4])
np.min([1, 2, 3, 4])
np.median([1, 2, 3, 4, 5])
np.cumsum([1, 2, 3])
np.cumprod([1, 2, 3])
np.zeros((2, 3))
np.ones((3, 2))
np.eye(3)
np.full((2, 2), 7)
np.concatenate(([1, 2], [3, 4]), axis=0)
np.diff([1, 3, 6, 10])
np.clip([1, 2, 3, 4], 2, 3)
np.sort([3, 1, 2])
np.argsort([3, 1, 2])
np.where([True, False, True], 1, 0)
np.nonzero([0, 1, 0, 2])
np.arange(0, 10, 2)
np.linspace(0, 1, 5)
np.dot([1, 2], [3, 4])
np.linalg.inv([[1, 2], [3, 4]])
np.isnan([1, np.nan, 3])
np.isfinite([1, np.nan, np.inf])
np.sqrt([1, 4, 9])
np.exp([0, 1, 2])
np.log([1, np.e, np.e**2])
np.power([1, 2, 3], 2)
np.random.rand(3, 2)
np.random.randint(0, 10, (3, 2))
np.unique([1, 2, 2, 3])
np.reshape([1, 2, 3, 4], (2, 2))
np.transpose([[1, 2], [3, 4]])
np.ravel([[1, 2], [3, 4]])
np.std([1, 2, 3, 4])
np.var([1, 2, 3, 4])
Explanation
Calculates the sum of array elements along a specified axis.
Returns the average (mean) of array elements.
Returns the maximum value in the array.
Returns the minimum value in the array.
Computes the median value (middle value) of the dataset.
Computes the cumulative sum of array elements along a specified axis.
Computes the cumulative product of array elements along a specified axis.
Creates an array of zeros with the specified shape.
Creates an array of ones with the specified shape.
Creates a 2D array with ones on the diagonal and zeros elsewhere.
Creates an array filled with a specified value and shape.
Joins two or more arrays along a specified axis.
Calculates the n-th order difference along the specified axis.
Limits the values in an array to a specified minimum and maximum.
Returns a sorted copy of an array along the specified axis.
Returns the indices that would sort an array.
Returns elements from x or y based on a condition.
Returns the indices of non-zero elements in an array.
Creates an array with values in the specified range and step.
Generates evenly spaced numbers between start and stop, inclusive.
Calculates the dot product of two arrays.
Computes the inverse of a square matrix.
Checks for NaN values in an array, returning a boolean array.
Checks for finite numbers, returning a boolean array.
Computes the square root of array elements element-wise.
Computes the exponential (e^x) of array elements element-wise.
Computes the natural logarithm (ln) of array elements.
Raises each element of the array to a specified power.
Generates an array of random values in the range [0, 1).
Returns random integers from a low (inclusive) to high (exclusive) range.
Returns the sorted unique values of an array.
Gives a new shape to an array without changing its data.
Reverses or permutes the axes of an array.
Flattens a multi-dimensional array into a 1D array.
Calculates the standard deviation of array elements.
Computes the variance of array elements.
When to Use
To compute totals across rows, columns, or entire datasets.
To calculate the central tendency of numeric data.
To find the highest value in a dataset.
To find the lowest value in a dataset.
To analyze the central tendency, particularly for skewed datasets.
To calculate running totals in datasets.
To calculate running products in datasets.
To initialize arrays with zero values for computations or placeholders.
To initialize arrays with ones for computations or placeholders.
To generate identity matrices or diagonal structures.
To quickly initialize arrays with a constant value.
To combine datasets horizontally or vertically.
To compute discrete differences or deltas in data.
To bound data values within a defined range.
To reorder data in ascending or descending order.
To determine sorting order without changing original data.
To filter or replace elements conditionally.
To locate positions of meaningful or non-zero values.
To generate sequences of numbers programmatically.
To create uniform grids or intervals in numerical analysis.
To perform matrix multiplication or vector dot products.
To solve systems of linear equations or invert transformations.
To identify and handle missing or undefined values.
To identify non-infinite, valid numbers.
To perform mathematical transformations on numeric data.
To perform exponential transformations in mathematical models.
To perform logarithmic transformations on datasets.
To apply polynomial transformations to data.
To create random datasets for testing or simulations.
To create random integer datasets.
To remove duplicates or identify unique elements.
To adjust the dimensions of an array for computations.
To swap rows and columns in matrices or multidimensional arrays.
To reshape data into a flat structure for computations.
To measure variability or dispersion in the dataset.
To analyze spread or variability in data.
Common Errors
Shape mismatch when specifying invalid axes.
Dividing by 0 for empty arrays.
Empty arrays raise ValueError.
Empty arrays raise ValueError.
Returns NaN for empty arrays.
Returns TypeError for non-numeric arrays.
Returns TypeError for non-numeric arrays.
Requires shape as a tuple for dimensions.
Requires shape as a tuple for dimensions.
Dimensions mismatch if non-square.
Requires valid shape and fill_value.
Axis mismatch or incompatible dimensions.
Invalid axis raises AxisError.
TypeError if bounds are invalid.
Returns ValueError if invalid axis.
Invalid axis raises AxisError.
Condition shape mismatch with x or y.
Empty arrays return empty indices.
Returns empty array if range is invalid.
Returns single value if num=1.
Incompatible shapes for multiplication.
Raises error for non-square or singular matrices.
Non-numeric arrays may raise errors.
Non-numeric arrays may raise errors.
Returns NaN for negative numbers.
Overflow errors for very large numbers.
Negative or zero values return NaN.
Non-numeric inputs raise TypeError.
Input dimensions must be integers.
Returns errors if bounds are invalid.
Works only with hashable data types.
Shape mismatch with total elements.
Shape mismatch for non-matrix objects.
Returns a copy, not a view of input.
Empty arrays may raise warnings.
Dividing by 0 for empty arrays.
Output Example
10
2.5
4
1
3
[1, 3, 6]
[1, 2, 6]
[[0., 0., 0.], [0., 0., 0.]]
[[1., 1.], [1., 1.], [1., 1.]]
[[1., 0., 0.], [0., 1., 0.]]
[[7, 7], [7, 7]]
[1, 2, 3, 4]
[2, 3, 4]
[2, 2, 3, 3]
[1, 2, 3]
[1, 2, 0]
[1, 0, 1]
(array([1, 3]),)
[0, 2, 4, 6, 8]
[0, 0.25, 0.5, 0.75, 1]
11
Inverted matrix of input.
[False, True, False]
[True, False, False]
[1., 2., 3.]
[1., 2.718, 7.389]
[0., 1., 2.]
[1, 4, 9]
Random 3x2 matrix of floats.
Random 3x2 matrix of integers.
[1, 2, 3]
[[1, 2], [3, 4]]
[[1, 3], [2, 4]]
[1, 2, 3, 4]
1.11803398874989
1.25
Category Function
Cumulative Analysis df.expanding
Data Aggregation df.groupby
Data Analysis df.nunique
Data Binning pd.qcut
Data Binning pd.cut
Data Cleaning df.isnull
Data Cleaning df.fillna
Data Cleaning df.dropna
Data Cleaning df.duplicated
Data Cleaning df.drop_duplicates
Data Concatenation pd.concat
Data Conversion pd.to_datetime
Data Conversion pd.to_numeric
Data Creation pd.DataFrame
Data Creation pd.Series
Data Exploration df.head
Data Exploration df.tail
Data Inspection df.info
Data Joining df.merge
Data Joining df.merge_asof
Data Renaming df.rename
Data Reshaping df.pivot_table
Data Reshaping pd.melt
Data Reshaping pd.pivot
Data Sampling df.sample
Data Sorting df.sort_values
Data Transformation df.apply
Data Transformation df.shift
File Reading pd.read_csv
Moving Aggregation df.rolling
Statistical Analysis df.corr
Statistical Analysis df.cov
Statistical Summary df.describe
Time Series df.resample
Syntax (pd.<function>)
df.expanding(min_periods).agg(func)
df.groupby(by).agg(func)
df.nunique(axis, dropna)
pd.qcut(x, q, labels)
pd.cut(x, bins, labels)
df.isnull()
df.fillna(value)
df.dropna(axis, how)
df.duplicated(subset, keep='first')
df.drop_duplicates(subset, keep='first')
pd.concat(objs, axis)
pd.to_datetime(data)
pd.to_numeric(arg, errors='raise')
pd.DataFrame(data, columns)
pd.Series(data, index)
df.head(n)
df.tail(n)
df.info()
pd.merge(left, right, on, how)
pd.merge_asof(left, right, on)
df.rename(columns=new_names)
df.pivot_table(values, index, columns, aggfunc)
pd.melt(frame, id_vars, value_vars)
df.pivot(index, columns, values)
df.sample(n, frac, random_state)
df.sort_values(by, ascending)
df.apply(func, axis)
df.shift(periods, axis)
pd.read_csv(filepath)
df.rolling(window).agg(func)
df.corr(method)
df.cov()
df.describe()
df.resample(rule).agg(func)
Usage Example
df.expanding(min_periods=1).sum()
df.groupby('A').mean()
df.nunique(axis=0)
pd.qcut(df['A'], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])
pd.cut(df['A'], bins=3, labels=['Low', 'Mid', 'High'])
df.isnull()
df.fillna(0)
df.dropna(axis=0, how='any')
df.duplicated(subset=['A'], keep=False)
df.drop_duplicates(subset=['A'])
pd.concat([df1, df2], axis=0)
pd.to_datetime(['2023-01-01', '2023-01-02'])
pd.to_numeric(['1', '2', 'a'], errors='coerce')
pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
pd.Series([1, 2, 3], index=['a', 'b', 'c'])
df.head(3)
df.tail(3)
df.info()
pd.merge(df1, df2, on='key', how='inner')
pd.merge_asof(df1, df2, on='time')
df.rename(columns={'A': 'a'})
df.pivot_table(values='A', index='B', aggfunc='sum')
pd.melt(df, id_vars='A', value_vars=['B', 'C'])
df.pivot(index='A', columns='B', values='C')
df.sample(n=3, random_state=42)
df.sort_values(by='A', ascending=False)
df['A'].apply(lambda x: x**2)
df.shift(periods=1)
pd.read_csv('data.csv')
df.rolling(window=3).mean()
df.corr(method='pearson')
df.cov()
df.describe()
df.resample('M').sum()
Explanation
Applies cumulative calculations to data.
Groups data by column(s) and applies an aggregation function.
Returns the number of unique values per column (or row if axis=1).
Divides data into quantile-based bins.
Divides data into interval-based bins.
Returns a DataFrame of boolean values indicating missing values (NaN).
Replaces missing values (NaN) with a specified value.
Drops rows (axis=0) or columns (axis=1) with missing values.
Returns a boolean Series indicating duplicate rows.
Removes duplicate rows from the DataFrame.
Concatenates two or more DataFrames along a specified axis.
Converts a list or column of strings into datetime objects.
Converts argument to numeric values, coercing or raising errors on invalid parsing.
Creates a DataFrame from a dictionary, list, or other data structures.
Creates a Series (1D labeled array) from a list, dictionary, or other data.
Displays the first n rows of a DataFrame.
Displays the last n rows of a DataFrame.
Displays a summary of the DataFrame, including data types and memory usage.
Combines two DataFrames based on a key column(s).
Merges two ordered DataFrames by key, performing an as-of join.
Renames columns or indices of the DataFrame.
Reshapes data by summarizing values across rows and columns.
Unpivots a DataFrame from wide to long format.
Reshapes data by specifying index, columns, and values.
Returns a random sample of rows or columns from the DataFrame.
Sorts a DataFrame by column values.
Applies a function along a specified axis of the DataFrame.
Shifts the index or data of the DataFrame by specified periods.
Reads a CSV file into a DataFrame.
Applies rolling window calculations to data.
Computes pairwise correlation of columns in the DataFrame.
Computes pairwise covariance of columns in the DataFrame.
Provides summary statistics for numerical columns in the DataFrame.
Aggregates time-series data into specified frequency buckets.
When to Use
To compute running totals, averages, etc., across the dataset.
To calculate aggregated metrics for grouped data.
To quickly determine the uniqueness of data across columns or rows.
To categorize numeric data into equal-probability bins.
To categorize numeric data into specified intervals.
To identify missing data for further processing.
To handle missing data by filling with defaults or computed values.
To remove incomplete data rows or columns.
To identify duplicate records in data for removal or analysis.
To clean up duplicate entries from data.
To append or combine datasets with similar structure.
To process date and time data effectively.
To handle and convert mixed-type data into numeric format.
To structure raw data into tabular format.
To create one-dimensional labeled data for analysis.
To inspect a quick overview of the dataset.
To inspect the ending rows of the dataset.
To quickly assess data types, null values, and memory usage.
To join data from multiple sources or tables.
To join data where rows closest in time or order should be matched.
To fix or standardize column names.
To create summary tables with aggregated data.
To restructure data for analysis or plotting.
To rearrange data for summary tables or visualizations.
To randomly select rows/columns for testing or validation.
To reorder data based on one or more columns.
To perform custom operations on DataFrame or Series elements.
To calculate differences or align data from previous periods.
To load data stored in CSV format for processing.
To compute moving averages, sums, or other statistics.
To measure relationships between numeric variables.
To assess variability between numeric variables.
To get a quick statistical overview of numerical data.
To downsample or upsample time-series data.
Common Errors
Requires numeric columns for calculations.
Invalid group keys or empty groups.
Non-numeric columns work fine.
Mismatched bins and labels raise errors.
Requires valid intervals and labels.
Works only with DataFrames or Series.
Non-numeric replacements for numeric columns.
Removes all rows/columns if not filtered properly.
Returns only False if no duplicates found.
Mismatched columns in subset.
Index duplication if not handled explicitly.
Invalid formats result in errors.
Non-convertible strings raise errors if not coerced.
Data type mismatches or invalid keys.
Length mismatch between data and index.
n cannot exceed the number of rows in DataFrame.
n cannot exceed the number of rows in DataFrame.
None; works on all DataFrames.
Key mismatches result in empty merges.
Requires sorted input DataFrames.
Key mismatches leave columns unchanged.
Mismatched rows/columns for pivoting.
Mismatched columns in id_vars or value_vars.
Overlapping data in pivot keys raises errors.
n cannot exceed the number of rows in DataFrame.
Invalid column names raise errors.
Incorrect axis or invalid function.
Works only on numeric or datetime data.
File not found or invalid delimiter issues.
Requires valid numeric data.
Non-numeric columns are ignored.
Non-numeric columns are ignored.
Excludes non-numeric columns automatically.
Requires datetime index for resampling.
Output Example
DataFrame with cumulative metrics.
Aggregated DataFrame.
{A: 3, B: 2}
Series with quantile bins.
Series with interval bins.
True/False table for nulls.
DataFrame with NaNs replaced.
Cleaned DataFrame.
[True, False, True, False]
DataFrame without duplicates.
Concatenated DataFrame.
DatetimeIndex of converted dates.
[1.0, 2.0, NaN]
A B \n 0 1 3 \n 1 2 4
a 1 \n b 2 \n c 3
First 3 rows of the DataFrame.
Last 3 rows of the DataFrame.
Table with column info.
Merged DataFrame.
As-of merged DataFrame.
DataFrame with updated names.
Pivot table DataFrame.
Long-format DataFrame.
Pivot table DataFrame.
Sample of 3 rows/columns.
Sorted DataFrame.
Transformed DataFrame/Series.
Shifted DataFrame.
DataFrame with file content.
DataFrame with rolling metrics.
Correlation matrix.
Covariance matrix.
Summary statistics table.
Resampled DataFrame.
Category Function/Command
Aggregation GROUP BY
Aggregation COUNT
Aggregation AVG
Aggregation SUM
Aggregation MIN/MAX
Conditional Logic CASE
Data Conversion CAST
Data Deletion DELETE
Data Insertion INSERT
Data Limiting LIMIT/OFFSET
Data Retrieval SELECT
Data Update UPDATE
Date and Time EXTRACT
Date and Time DATE_ADD
Date and Time DATE_SUB
Date and Time DATEDIFF
Date and Time NOW
Date and Time CURDATE
Filtering WHERE
Lookup XLOOKUP (PostgreSQL Extension)
Null Handling COALESCE
Null Handling NULLIF
Null Handling ISNULL
Null Handling IFNULL
Numeric Function ROUND
Numeric Function FLOOR
Numeric Function CEIL/CEILING
Numeric Function POWER
Pattern Matching LIKE
Ranking RANK
Ranking ROW_NUMBER()
Ranking RANK()
Ranking DENSE_RANK()
Ranking NTILE()
Schema Creation CREATE TABLE
Schema Deletion DROP TABLE
Schema Modification ALTER TABLE
Set Operations UNION
Set Operations UNION ALL
Set Operations INTERSECT
Set Operations EXCEPT
Sorting ORDER BY
String Manipulation SUBSTRING
String Manipulation CHAR_LENGTH
String Manipulation TRIM
String Manipulation CONCAT
String Manipulation REPLACE
Table Combination JOIN
Uniqueness Filtering DISTINCT
Window Function LAG()
Window Function LEAD()
c
SELECT column, AGG_FUNC(column2) FROM table GROUP BY col;
SELECT COUNT(column) FROM table_name;
SELECT AVG(column) FROM table_name;
SELECT SUM(column) FROM table_name;
SELECT MIN/MAX(column) FROM table_name;
CASE WHEN condition THEN result ELSE result END
CAST(expression AS datatype)
DELETE FROM table_name WHERE condition;
INSERT INTO table_name (col1, col2) VALUES (val1, val2);
SELECT * FROM table_name LIMIT number OFFSET number;
SELECT column1, column2 FROM table_name;
UPDATE table_name SET col1 = val1 WHERE condition;
EXTRACT(part FROM date)
DATE_ADD(date, INTERVAL value unit)
DATE_SUB(date, INTERVAL value unit)
DATEDIFF(date1, date2)
NOW()
CURDATE()
SELECT * FROM table_name WHERE condition;
SELECT * FROM table1 WHERE col IN (SELECT col FROM table2)
COALESCE(expr1, expr2, ...)
NULLIF(expr1, expr2)
ISNULL(expression)
IFNULL(expression, default_value)
ROUND(column, decimals)
FLOOR(column)
CEIL(column)
POWER(base, exponent)
SELECT * FROM table_name WHERE column LIKE pattern;
SELECT RANK() OVER (PARTITION BY col ORDER BY col2);
SELECT ROW_NUMBER() OVER (ORDER BY column) AS row_num
SELECT RANK() OVER (PARTITION BY col ORDER BY col2);
SELECT DENSE_RANK() OVER (PARTITION BY col ORDER BY col2);
SELECT NTILE(n) OVER (ORDER BY col) AS bucket
CREATE TABLE table_name (col1 datatype, col2 datatype);
DROP TABLE table_name;
ALTER TABLE table_name ADD/DROP column datatype;
SELECT column FROM table1 UNION SELECT column FROM table2
SELECT column FROM table1 UNION ALL SELECT column FROM table2
SELECT column FROM table1 INTERSECT SELECT column FROM table2
SELECT column FROM table1 EXCEPT SELECT column FROM table2
SELECT * FROM table_name ORDER BY column ASC/DESC;
SUBSTRING(column FROM start FOR length)
CHAR_LENGTH(column)
TRIM([LEADING/TRAILING/BOTH] 'char' FROM column)
CONCAT(string1, string2, ...)
REPLACE(column, old_string, new_string)
SELECT columns FROM table1 JOIN table2 ON condition;
SELECT DISTINCT column FROM table_name;
SELECT LAG(col) OVER (ORDER BY col2) AS prev_value
SELECT LEAD(col) OVER (ORDER BY col2) AS next_value
Usage Example
SELECT age, COUNT(*) FROM students GROUP BY age;
SELECT COUNT(*) FROM students;
SELECT AVG(age) FROM students;
SELECT SUM(salary) FROM employees;
SELECT MIN(age) FROM students;
SELECT name, CASE WHEN age >= 18 THEN 'Adult' ELSE 'Minor' END AS category FROM students;
SELECT CAST(salary AS DECIMAL(10,2)) FROM employees;
DELETE FROM students WHERE age < 18;
INSERT INTO students (name, age) VALUES ('John', 18);
SELECT * FROM students LIMIT 10 OFFSET 5;
SELECT name, age FROM students;
UPDATE students SET age = 19 WHERE name = 'John';
SELECT EXTRACT(YEAR FROM birth_date) FROM employees;
SELECT DATE_ADD('2024-01-01', INTERVAL 10 DAY);
SELECT DATE_SUB('2024-01-01', INTERVAL 15 DAY);
SELECT DATEDIFF('2024-12-31', '2024-01-01');
SELECT NOW();
SELECT CURDATE();
SELECT * FROM students WHERE age > 18;
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE city='Dhaka');
SELECT COALESCE(email, 'No Email') FROM students;
SELECT NULLIF(salary, 0) FROM employees;
SELECT ISNULL(email) FROM students;
SELECT IFNULL(email, 'Not Available') FROM students;
SELECT ROUND(salary, 2) FROM employees;
SELECT FLOOR(price) FROM products;
SELECT CEIL(price) FROM products;
SELECT POWER(2, 3) AS result;
SELECT * FROM students WHERE name LIKE 'J%';
SELECT RANK() OVER (ORDER BY salary DESC) AS rank FROM employees;
SELECT ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank FROM employees;
SELECT RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank FROM employees;
SELECT DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank FROM employees;
SELECT NTILE(4) OVER (ORDER BY salary DESC) AS quartile FROM employees;
CREATE TABLE students (id INT, name TEXT);
DROP TABLE students;
ALTER TABLE students ADD email TEXT;
SELECT name FROM table1 UNION SELECT name FROM table2;
SELECT name FROM table1 UNION ALL SELECT name FROM table2;
SELECT name FROM table1 INTERSECT SELECT name FROM table2;
SELECT name FROM table1 EXCEPT SELECT name FROM table2;
SELECT * FROM students ORDER BY age DESC;
SELECT SUBSTRING(name FROM 1 FOR 3) FROM students;
SELECT CHAR_LENGTH(name) FROM students;
SELECT TRIM(' ' FROM name) FROM students;
SELECT CONCAT(first_name, ' ', last_name) FROM students;
SELECT REPLACE(name, 'a', 'o') FROM students;
SELECT s.name, c.course FROM students s JOIN courses c ON s.id = c.student_id;
SELECT DISTINCT age FROM students;
SELECT salary, LAG(salary) OVER (ORDER BY id) AS prev_salary FROM employees;
SELECT salary, LEAD(salary) OVER (ORDER BY id) AS next_salary FROM employees;
Explanation
Groups rows that share a value and applies aggregate functions like COUNT, SUM, AVG.
Counts rows in a table or non-NULL values in a column.
Calculates the average of a numeric column.
Calculates the total of a numeric column.
Finds the smallest (MIN) or largest (MAX) value in a column.
Evaluates conditions and returns a value for each row.
Converts a value from one data type to another.
Deletes rows from a table based on a condition.
Adds new rows to a table.
Limits the number of rows returned and skips a specific number of rows.
Retrieves specific columns from a table.
Modifies existing rows based on a condition.
Extracts a specific part of a date (e.g., year, month, day).
Adds a specified time interval to a given date, such as days, months, or years.
Subtracts a specified time interval from a date.
Returns the number of days between two dates.
Returns the current date and time.
Returns the current date (without time).
Filters rows based on a condition.
Looks up values in one table based on another table's values.
Returns the first non-NULL value from a list of expressions.
Returns NULL if the two expressions are equal, otherwise returns the first expression.
Checks if a value is NULL and returns 1 if true, 0 otherwise.
Replaces NULL values with a specified default value.
Rounds numeric values to the specified number of decimal places.
Returns the largest integer less than or equal to a number.
Returns the smallest integer greater than or equal to a number.
Returns the result of raising a number to a power.
Searches for patterns in a column using wildcards (%, _).
Assigns a rank to rows based on a specified order.
Assigns a unique number to each row based on a specified order.
Assigns a rank to rows, with ties receiving the same rank and skipping numbers.
Similar to RANK(), but does not skip numbers when there are ties.
Divides rows into n roughly equal-sized groups, assigning each row a group number.
Creates a new table with specified columns and data types.
Deletes a table and all its data permanently.
Adds, drops, or modifies columns in an existing table.
Combines results of two SELECT statements, removing duplicates.
Combines results of two SELECT statements, retaining all duplicates.
Returns only rows common to both queries.
Returns rows from the first query that are not in the second query.
Sorts rows by one or more columns in ascending (default) or descending order.
Extracts a substring from a text column.
Returns the number of characters in a string.
Removes leading, trailing, or both specified characters (default is space) from text data.
Concatenates two or more strings into one.
Replaces all occurrences of a substring with another string.
Combines rows from two tables based on a related column.
Retrieves unique values from a column.
Accesses the value of a column from the previous row in a result set.
Accesses the value of a column from the next row in a result set.
When to Use
To summarize data into groups for analysis.
To measure the size of a dataset or a specific subset.
To find mean values of numeric data.
To compute the total of numeric data.
To identify extremes in numeric or date columns.
To apply conditional logic in SELECT queries.
To enforce data type consistency in queries.
To remove specific or unwanted data.
To populate tables with new data.
To control the result size in paginated queries.
To fetch specific data from a database.
To correct or change data in the table.
To retrieve parts of date-time data for analysis.
To calculate future dates by adding specific intervals to an existing date.
To calculate past dates based on a given date.
To measure the duration between two dates.
To get the system's current timestamp.
To get the system's current date.
To extract rows meeting specific criteria.
To perform lookups or subqueries between related tables.
To replace NULL values with defaults.
To avoid specific values being included in calculations.
To detect and handle NULL values.
To avoid NULL values in query results.
To simplify numeric data for reports or calculations.
To round numbers down for integer operations.
To round numbers up for integer operations.
To perform exponential calculations.
To filter rows based on partial matches in text columns.
To create rankings for analytical purposes.
To create sequential numbering for rows.
To create ranked lists within groups or categories.
To create rankings without skipping numbers.
To categorize data into quartiles, deciles, etc.
To define a new structure for storing data.
To remove obsolete or temporary tables.
To adjust the schema of an existing table.
To combine data from two similar datasets.
To merge datasets without removing duplicates.
To find common data in two datasets.
To exclude specific data from a dataset.
To arrange data in a meaningful order.
To extract specific parts of text data.
To measure the length of text data.
To clean text data by removing unwanted spaces or characters.
To combine multiple text fields into a single value.
To modify specific parts of text data.
To fetch related data spread across multiple tables.
To identify distinct data in a table.
To compare a row's value with the previous row.
To compare a row's value with the next row.
Common Errors
GROUP BY column not in SELECT clause.
NULL values affect results without *.
Non-numeric columns raise errors.
Non-numeric columns raise errors.
Non-numeric columns may produce inconsistent results.
Missing END keyword or invalid syntax.
Invalid conversions (e.g., text to date).
Forgetting the WHERE clause deletes all rows.
Mismatched columns and values count.
Non-numeric values for LIMIT or OFFSET.
Misspelling column or table names.
Forgetting the WHERE clause updates all rows.
Invalid part names cause errors.
Invalid intervals or non-date inputs.
Invalid interval or non-date input.
Invalid date format causes errors.
None; always outputs the system time.
None; always outputs the system date.
Invalid column names in the condition.
Invalid subquery or mismatched data types.
All arguments are NULL.
Works only with two expressions.
Applied to non-existent columns.
Applied to incompatible data types.
Non-numeric columns raise errors.
Non-numeric columns raise errors.
Non-numeric columns raise errors.
Non-numeric arguments raise errors.
Case-sensitive in some databases (e.g., MySQL).
Missing OVER clause results in errors.
Missing OVER clause.
Missing OVER clause.
Missing OVER clause.
Invalid argument (e.g., n <= 0).
Data type mismatches with inserted data.
Table does not exist error.
Dropping non-existent columns.
Different column counts or types in SELECT.
Different column counts or types in SELECT.
Not supported in some databases like MySQL.
Not supported in some databases like MySQL.
Sorting non-numeric columns incorrectly.
Invalid indices or non-string column.
Non-string columns raise errors.
Non-string columns raise errors.
NULL values result in NULL output.
Non-string columns raise errors.
Missing or ambiguous JOIN condition.
None; works with all data.
Missing OVER clause.
Missing OVER clause.
Output Example
Grouped and aggregated results.
Total row count: X.
Average value: X.
Total value: X.
Minimum or maximum value.
Conditional values for each row.
Converted value.
Affected rows: X.
Affected rows: 1.
Limited rows.
name age \n John 18 \n Jane 20
Affected rows: 1.
Extracted part of the date.
'2024-01-11' (if adding 10 days).
Updated date.
Difference in days.
Current date-time.
Current date.
Filtered rows.
Matched rows from the query.
Non-NULL value or default.
NULL or original value.
1 or 0 for each row.
Value or default.
Rounded values.
Floored integers.
Ceiled integers.
Calculated power result.
Filtered rows with matching patterns.
Ranked rows.
Numbered rows.
Ranked rows with ties.
Ranked rows without gaps.
Group number for each row.
Table created.
Table removed.
Table structure updated.
Combined distinct rows.
Combined rows, including duplicates.
Common rows.
Rows unique to the first query.
Sorted rows.
Extracted substring.
Number of characters.
Trimmed text.
Combined text string.
Modified text.
Combined table.
Unique rows from column.
Values from previous rows.
Values from next rows.
Category Chart Type
Comparison Bar Chart
Comparison Radar Chart
Cumulative Area Chart
Cyclic Data Polar Chart
Distribution Histogram
Distribution Box Plot
Distribution Violin Plot
Distribution Density Plot
Flow Waterfall Chart
Flow Sankey Diagram
Geographical Choropleth Map
Hierarchy Treemap
Intensity Heat Map
Intensity Hexbin Plot
Multi-Dimensional Bubble Chart
Performance Bullet Chart
Process Funnel Chart
Proportion Pie Chart
Relationship Scatter Plot
Text Analysis Word Cloud
Trends Line Chart
When to Use
When comparing different categories.
When comparing multiple variables.
When showing cumulative data over time.
When showing cyclic data.
When showing frequency distributions.
When showing data distribution and outliers.
When showing data distribution and density.
When showing data distribution over a continuous interval.
When showing sequential positive or negative values.
When showing flow or transfer of data.
When showing geographical data.
When showing hierarchical data.
When showing data intensity or density.
When showing density of points in a scatter plot.
When showing three dimensions of data.
When showing performance metrics.
When showing stages in a process.
When showing proportions of a whole.
When showing relationships between variables.
When showing text data frequency.
When you need to show trends over time.
Why to Use
To compare values across different categories.
To display performance metrics.
To emphasize the magnitude of data points.
To visualize data points in a circular format.
To visualize the distribution of data.
To compare distributions across different groups.
To visualize the distribution of data and compare across categories.
To visualize the probability density of the data.
To visualize cumulative effects.
To visualize the movement of data.
To visualize data distribution across regions.
To visualize part-to-whole relationships.
To visualize complex data matrices.
To visualize the concentration of data points.
To represent an additional data dimension.
To compare actual performance against targets.
To visualize stages of a process or funnel.
To represent parts of a whole.
To identify correlations or patterns.
To highlight most frequent words.
To visualize continuous data points and trends.
Tools
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Matplotlib, Seaborn
Matplotlib, Seaborn
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Excel, Matplotlib, Seaborn, Plotly
Additional Information
Useful for categorical data, such as sales by region or product.
Shows strengths and weaknesses across different metrics or categories.
Similar to line charts but with filled areas; useful for cumulative totals.
Often used for time-based data like seasonal trends.
Helps in understanding data variability and skewness.
Provides a summary of multiple data points and detects outliers.
Combines aspects of box plot and density plot; useful for comparing distributions.
Similar to histograms but smoother; useful for showing distribution patterns.
Useful for financial and performance analysis to show gains and losses.
Useful for process flow analysis and energy/resource flow.
Uses color shading to represent data values by geographic area; ideal for demographic data.
Uses nested rectangles to represent data; great for representing proportions within a hierarchy.
Uses color gradients to represent values; ideal for correlation matrices.
Aggregates points into hexagonal bins to show density; useful for large datasets.
Combines scatter plot with data size; great for multi-variable analysis.
Combines bar and line charts for comparison; ideal for KPI dashboards.
Useful for sales and marketing analysis, such as sales pipeline stages.
Best for showing percentage distribution; avoid using for too many categories.
Useful for detecting relationships or clusters in data.
Useful for qualitative data analysis and visualizing text frequency.
Ideal for time series data, stock prices, and temperature changes.