Processing¶

When working with MICRESS tabular data using MicPy, understanding how to process the data efficiently is essential. Data processing involves various tasks such as extracting specific subsets of data, selecting individual columns or rows, and accessing specific values within a DataFrame. This section will guide you through the key techniques for processing your MICRESS data to enable detailed analysis and insights.

Data structure¶

When you read a tabular file with MicPy, the data is returned as a Pandas DataFrame object. A DataFrame is a two-dimensional data structure that organizes data into rows and columns, similar to a table or spreadsheet. Each row typically represents an observation, such as a time step in a MICRESS simulation, while each column represents a variable or feature, such as temperature or phase fractions.

Below is an example of a DataFrame containing the results of a MICRESS Delta-Gamma transformation simulation:

	Simulation time [s]	Temperature [K]	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2	Fraction Phase 2 FCC_A1
0	0.0000	1786.00000	1.000000	0.000000	0.000000
1	1.0000	1785.00000	0.998751	0.001249	0.000000
2	2.5000	1783.50000	0.992867	0.007133	0.000000
3	5.0000	1781.00000	0.976858	0.023142	0.000000
4	7.5000	1778.50000	0.959798	0.040202	0.000000
...	...	...	...	...	...

To effectively work with your data, you need to know how to access and manipulate it within the DataFrame. The following sections introduce various methods provided by Pandas for accessing specific subsets of data, selecting individual columns or rows, and retrieving specific values.

Reading the file¶

To load data from a MICRESS tabular file into a DataFrame, you can use the read() method from the tab module in MicPy. This method parses the file and returns a DataFrame object.

from micpy import tab

df = tab.read("A001_Delta_Gamma.TabF")

Selecting columns¶

Columns in a DataFrame represent different variables or features. You can select individual columns by specifying the column label within square brackets. For example, to extract the temperature column:

temperature = df["Temperature [K]"]

This returns a Pandas Series object, which is a one-dimensional array representing the selected column.

	Temperature [K]
0	1786.00000
1	1785.00000
2	1783.50000
3	1781.00000
4	1778.50000
...	...

To select multiple columns, pass a list of column labels:

phase_fractions = df[["Fraction Phase 0 LIQUID", "Fraction Phase 1 BCC_A2"]]

This returns a DataFrame containing the selected columns:

	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2
0	1.000000	0.000000
1	0.998751	0.001249
2	0.992867	0.007133
3	0.976858	0.023142
4	0.959798	0.040202
...	...	...

Selecting rows¶

Rows in a DataFrame often correspond to time steps. You can select rows using the loc[] method by specifying index labels or using iloc[] for integer-based indexing.

To select the first row using loc[]:

first_row_by_label = df.loc[0]

This returns a Series object with the data from the first row:

	Simulation time [s]	Temperature [K]	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2	Fraction Phase 2 FCC_A1
0	0.0000	1786.00000	1.000000	0.000000	0.000000

To select the last row using iloc[]:

last_row_by_index = df.iloc[-1]

This command also returns a Series object but contains the data from the last row.

	Simulation time [s]	Temperature [K]	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2	Fraction Phase 2 FCC_A1
20	50.0000	1751.45000	0.467904	0.259227	0.272869

To select multiple rows, pass a list of index labels or integer positions:

selected_rows = df.loc[[0, 1, 2]]

This returns a DataFrame with the specified rows.

	Simulation time [s]	Temperature [K]	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2
0	0.0000	1786.00000	1.000000	0.000000
1	1.0000	1785.00000	0.998751	0.001249
2	2.5000	1783.50000	0.992867	0.007133

Selecting values¶

To access specific values within a DataFrame, use the at[] method for label-based indexing or iat[] for integer-based indexing.

For instance, to get the temperature value from the first row:

initial_temperature = df.at[0, "Temperature [K]"]

This returns the float value 1786.0.

Filtering rows¶

Filtering allows you to select rows that meet specific criteria. This is done using boolean indexing. For example, to select rows where the temperature exceeds a certain threshold:

high_temperature_data = df[df["Temperature [K]"] > 1780]

This returns a DataFrame containing only the rows where the temperature is greater than 1780 Kelvin.

	Simulation time [s]	Temperature [K]	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2
0	0.0000	1786.00000	1.000000	0.000000
1	1.0000	1785.00000	0.998751	0.001249
2	2.5000	1783.50000	0.992867	0.007133
3	5.0000	1781.00000	0.976858	0.023142

You can combine multiple conditions using logical operators such as & (AND) and | (OR). For example, to filter rows where the temperature is between 1750 K and 1760 K:

temperature_range_data = df[
    (df["Temperature [K]"] >= 1750) & (df["Temperature [K]"] <= 1760)
]

Sorting rows¶

Sorting rows allows you to reorder your DataFrame based on the values in one or more columns. To sort rows by simulation time in descending order, use the sort_values() method:

sorted_data = df.sort_values(by="Simulation time [s]", ascending=False)

This returns a DataFrame with rows sorted from the highest to the lowest temperature.

	Simulation time [s]	Temperature [K]	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2	Fraction Phase 2 FCC_A1
20	50.0000	1751.45000	0.467904	0.259227	0.272869
19	45.0000	1751.37500	0.464527	0.259414	0.276058
18	40.0000	1751.47500	0.467859	0.260437	0.271704
17	35.0000	1751.42500	0.461359	0.260637	0.278005
...	...	...	...	...	...

For sorting by multiple columns, pass a list of column names:

sorted_data = df.sort_values(
    by=["Simulation time [s]", "Temperature [K]"], ascending=[False, True]
)

This sorts by simulation time in descending order, then by temperature in ascending order for rows with the same simulation time.

Creating columns¶

You can create new columns in a DataFrame by performing arithmetic operations on existing columns. For example, to convert the temperature from Kelvin to Celsius:

df["Temperature [°C]"] = df["Temperature [K]"] - 273.15

This adds a new column Temperature [°C] in your DataFrame:

	Simulation time [s]	Temperature [K]	Fraction Phase 0 LIQUID	Fraction Phase 1 BCC_A2	Fraction Phase 2 FCC_A1	Temperature [°C]
0	0.0000	1786.00000	1.000000	0.000000	0.000000	1512.85000
1	1.0000	1785.00000	0.998751	0.001249	0.000000	1511.85000
2	2.5000	1783.50000	0.992867	0.007133	0.000000	1510.35000
3	5.0000	1781.00000	0.976858	0.023142	0.000000	1507.85000
4	7.5000	1778.50000	0.959798	0.040202	0.000000	1505.35000
...	...	...	...	...	...	...

Grouping data¶

Grouping data allows you to aggregate information based on specific criteria. For example, to calculate the average number of grain neighbors for each time step of a grain growth simulation:

from micpy import tab

df = tab.read("T10_01_GrainGrowth_2D.TabGD")

average_neighbors = df.groupby("Simulation time [s]")["Nb. of Neighbours"].mean()

Simulation time [s]	Nb. of Neighbours
0.0	6.537500
5.0	6.226415
10.0	6.476190
15.0	6.492063
...	...
300.0	6.000000

df contains the grain growth simulation data, where each row represents a grain at a specific time step. df.groupby("Simulation time [s]") groups the data by simulation time, and ["Nb. of Neighbours"].mean() calculates the average number of neighbors for each time step.

Conclusion¶

Understanding how to effectively process and manipulate MICRESS tabular data using MicPy and Pandas is crucial for detailed analysis. The techniques discussed here cover the fundamental operations you'll need to perform comprehensive data analyses. By familiarizing yourself with these methods, you'll be better equipped to extract insights and draw meaningful conclusions from your MICRESS simulation results. For more advanced data processing tasks, consider exploring the official Pandas documentation and additional resources available online.