Processing¶
When working with MICRESS tabular data using MicPy, understanding how to process the data efficiently is essential. Data processing involves various tasks such as extracting specific subsets of data, selecting individual columns or rows, and accessing specific values within a DataFrame. This section will guide you through the key techniques for processing your MICRESS data to enable detailed analysis and insights.
Data structure¶
When you read a tabular file with MicPy, the data is returned as a Pandas DataFrame object. A DataFrame is a two-dimensional data structure that organizes data into rows and columns, similar to a table or spreadsheet. Each row typically represents an observation, such as a time step in a MICRESS simulation, while each column represents a variable or feature, such as temperature or phase fractions.
Below is an example of a DataFrame containing the results of a MICRESS Delta-Gamma transformation simulation:
Simulation time [s] | Temperature [K] | Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | Fraction Phase 2 FCC_A1 | |
---|---|---|---|---|---|
0 | 0.0000 | 1786.00000 | 1.000000 | 0.000000 | 0.000000 |
1 | 1.0000 | 1785.00000 | 0.998751 | 0.001249 | 0.000000 |
2 | 2.5000 | 1783.50000 | 0.992867 | 0.007133 | 0.000000 |
3 | 5.0000 | 1781.00000 | 0.976858 | 0.023142 | 0.000000 |
4 | 7.5000 | 1778.50000 | 0.959798 | 0.040202 | 0.000000 |
... | ... | ... | ... | ... | ... |
To effectively work with your data, you need to know how to access and manipulate it within the DataFrame. The following sections introduce various methods provided by Pandas for accessing specific subsets of data, selecting individual columns or rows, and retrieving specific values.
Reading the file¶
To load data from a MICRESS tabular file into a DataFrame, you can use the read()
method from the tab
module in MicPy. This method parses the file and returns a DataFrame object.
Selecting columns¶
Columns in a DataFrame represent different variables or features. You can select individual columns by specifying the column label within square brackets. For example, to extract the temperature column:
This returns a Pandas Series object, which is a one-dimensional array representing the selected column.
Temperature [K] | |
---|---|
0 | 1786.00000 |
1 | 1785.00000 |
2 | 1783.50000 |
3 | 1781.00000 |
4 | 1778.50000 |
... | ... |
To select multiple columns, pass a list of column labels:
This returns a DataFrame containing the selected columns:
Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | |
---|---|---|
0 | 1.000000 | 0.000000 |
1 | 0.998751 | 0.001249 |
2 | 0.992867 | 0.007133 |
3 | 0.976858 | 0.023142 |
4 | 0.959798 | 0.040202 |
... | ... | ... |
Selecting rows¶
Rows in a DataFrame often correspond to time steps. You can select rows using the loc[]
method by specifying index labels or using iloc[]
for integer-based indexing.
To select the first row using loc[]
:
This returns a Series object with the data from the first row:
Simulation time [s] | Temperature [K] | Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | Fraction Phase 2 FCC_A1 | |
---|---|---|---|---|---|
0 | 0.0000 | 1786.00000 | 1.000000 | 0.000000 | 0.000000 |
To select the last row using iloc[]
:
This command also returns a Series object but contains the data from the last row.
Simulation time [s] | Temperature [K] | Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | Fraction Phase 2 FCC_A1 | |
---|---|---|---|---|---|
20 | 50.0000 | 1751.45000 | 0.467904 | 0.259227 | 0.272869 |
To select multiple rows, pass a list of index labels or integer positions:
This returns a DataFrame with the specified rows.
Simulation time [s] | Temperature [K] | Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | Fraction Phase 2 FCC_A1 | |
---|---|---|---|---|---|
0 | 0.0000 | 1786.00000 | 1.000000 | 0.000000 | 0.000000 |
1 | 1.0000 | 1785.00000 | 0.998751 | 0.001249 | 0.000000 |
2 | 2.5000 | 1783.50000 | 0.992867 | 0.007133 | 0.000000 |
Selecting values¶
To access specific values within a DataFrame, use the at[]
method for label-based indexing or iat[]
for integer-based indexing.
For instance, to get the temperature value from the first row:
This returns the float value 1786.0
.
Filtering rows¶
Filtering allows you to select rows that meet specific criteria. This is done using boolean indexing. For example, to select rows where the temperature exceeds a certain threshold:
This returns a DataFrame containing only the rows where the temperature is greater than 1780 Kelvin.
Simulation time [s] | Temperature [K] | Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | Fraction Phase 2 FCC_A1 | |
---|---|---|---|---|---|
0 | 0.0000 | 1786.00000 | 1.000000 | 0.000000 | 0.000000 |
1 | 1.0000 | 1785.00000 | 0.998751 | 0.001249 | 0.000000 |
2 | 2.5000 | 1783.50000 | 0.992867 | 0.007133 | 0.000000 |
3 | 5.0000 | 1781.00000 | 0.976858 | 0.023142 | 0.000000 |
You can combine multiple conditions using logical operators such as &
(AND) and |
(OR). For example, to filter rows where the temperature is between 1750 K and 1760 K:
Sorting rows¶
Sorting rows allows you to reorder your DataFrame based on the values in one or more columns. To sort rows by simulation time in descending order, use the sort_values()
method:
This returns a DataFrame with rows sorted from the highest to the lowest temperature.
Simulation time [s] | Temperature [K] | Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | Fraction Phase 2 FCC_A1 | |
---|---|---|---|---|---|
20 | 50.0000 | 1751.45000 | 0.467904 | 0.259227 | 0.272869 |
19 | 45.0000 | 1751.37500 | 0.464527 | 0.259414 | 0.276058 |
18 | 40.0000 | 1751.47500 | 0.467859 | 0.260437 | 0.271704 |
17 | 35.0000 | 1751.42500 | 0.461359 | 0.260637 | 0.278005 |
... | ... | ... | ... | ... | ... |
For sorting by multiple columns, pass a list of column names:
This sorts by simulation time in descending order, then by temperature in ascending order for rows with the same simulation time.
Creating columns¶
You can create new columns in a DataFrame by performing arithmetic operations on existing columns. For example, to convert the temperature from Kelvin to Celsius:
This adds a new column Temperature [°C]
in your DataFrame:
Simulation time [s] | Temperature [K] | Fraction Phase 0 LIQUID | Fraction Phase 1 BCC_A2 | Fraction Phase 2 FCC_A1 | Temperature [°C] | |
---|---|---|---|---|---|---|
0 | 0.0000 | 1786.00000 | 1.000000 | 0.000000 | 0.000000 | 1512.85000 |
1 | 1.0000 | 1785.00000 | 0.998751 | 0.001249 | 0.000000 | 1511.85000 |
2 | 2.5000 | 1783.50000 | 0.992867 | 0.007133 | 0.000000 | 1510.35000 |
3 | 5.0000 | 1781.00000 | 0.976858 | 0.023142 | 0.000000 | 1507.85000 |
4 | 7.5000 | 1778.50000 | 0.959798 | 0.040202 | 0.000000 | 1505.35000 |
... | ... | ... | ... | ... | ... | ... |
Grouping data¶
Grouping data allows you to aggregate information based on specific criteria. For example, to calculate the average number of grain neighbors for each time step of a grain growth simulation:
Simulation time [s] | Nb. of Neighbours |
---|---|
0.0 | 6.537500 |
5.0 | 6.226415 |
10.0 | 6.476190 |
15.0 | 6.492063 |
... | ... |
300.0 | 6.000000 |
df
contains the grain growth simulation data, where each row represents a grain at a specific time step. df.groupby("Simulation time [s]")
groups the data by simulation time, and ["Nb. of Neighbours"].mean()
calculates the average number of neighbors for each time step.
Conclusion¶
Understanding how to effectively process and manipulate MICRESS tabular data using MicPy and Pandas is crucial for detailed analysis. The techniques discussed here cover the fundamental operations you'll need to perform comprehensive data analyses. By familiarizing yourself with these methods, you'll be better equipped to extract insights and draw meaningful conclusions from your MICRESS simulation results. For more advanced data processing tasks, consider exploring the official Pandas documentation and additional resources available online.